Danial Amankos — ML Learning Portfolio

Project 2 — Baseline Models

Goal: build simple baseline classifiers and learn evaluation fundamentals. I compare Logistic Regression and a Decision Tree on the same dataset from Project 1.

Dataset: Breast Cancer Wisconsin (scikit-learn).

Setup

Train size: 455

Test size: 114

Random state: 42

What I learned

Using a train/test split keeps the evaluation honest (no training on the test data).
Logistic Regression benefits from feature scaling, so I used a pipeline with StandardScaler.
A Decision Tree is easy to train but can overfit, so I capped its depth for a simple baseline.
Accuracy is useful, but the confusion matrix helps me see the types of mistakes the model makes.

Results

Logistic Regression

Accuracy: 0.9825

Confusion matrix:

Decision Tree

Accuracy: 0.9386

Confusion matrix:

Notes

I’m treating these as baselines, not final models. The point is to learn the workflow: clean split, train, evaluate, then reflect on what the results mean.