Project 2 — Baseline Models
Goal: build simple baseline classifiers and learn evaluation fundamentals. I compare Logistic Regression and a Decision Tree on the same dataset from Project 1.
Dataset: Breast Cancer Wisconsin (scikit-learn).
Setup
Train size: 455
Test size: 114
Random state: 42
What I learned
  • Using a train/test split keeps the evaluation honest (no training on the test data).
  • Logistic Regression benefits from feature scaling, so I used a pipeline with StandardScaler.
  • A Decision Tree is easy to train but can overfit, so I capped its depth for a simple baseline.
  • Accuracy is useful, but the confusion matrix helps me see the types of mistakes the model makes.
Results
Logistic Regression
Accuracy: 0.9825
Confusion matrix:
Logistic Regression confusion matrix
Decision Tree
Accuracy: 0.9386
Confusion matrix:
Decision Tree confusion matrix
Notes
I’m treating these as baselines, not final models. The point is to learn the workflow: clean split, train, evaluate, then reflect on what the results mean.