Step 1 — Pick a dataset + run a first exploration
What I did
- Chose a real dataset that ships with scikit-learn (so the project is easy to reproduce).
- Checked basic shape, missing values, and target balance.
- Generated a few charts to understand distributions and correlations.
What I learned
- Even before modeling, you can often predict which features will matter by looking at distributions and correlations.
- Correlation heatmaps can be misleading if you include too many features, so I kept it readable (top features only).
Next In Step 2, I’ll build a baseline classifier (Logistic Regression + Decision Tree) and compare results.