MST0052 · Lecture 10 · Fall 2026
90 minutes · one model family · three workflows
01
Regression: same algorithm, within-node variance; leaves predict the mean.
Keeps splitting until every leaf is pure. Error: zero.
A slightly different sample grows a wildly different tree.
02
Averaging inherits it — unchanged.
Averaging slashes it.
The recipe · Bootstrap AGGregating
03
Each tree gets its own bootstrap sample.
The strongest features still top every tree.
m = √p
m = p / 3
Slightly worse — fewer features to pick from.
Meaningfully better — decorrelated errors average away.
No scaler — trees don't care about units. The shortest pipeline in the course.
Reference — screenshot this
Tune with GridSearchCV — L7's machinery, unchanged.
GridSearchCV
04
Biased toward high-cardinality features — and toward whichever correlated twin got picked first.
predict_proba
Pitfall 1 · OOB is a sanity check
Free — report it alongside CV.
Still touched exactly once (L7).
Pitfall 2 · More trees never overfit — they just cost time
Pitfall 3 · Correlated features split the credit
Each looks half as important.
The story the plot tells is wrong.
The pattern · memorise this
05
load_breast_cancer() · 80/20 stratified · random_state=42 · 5-fold CV · f1
Workflow 1 · One unpruned tree
Workflow 2 · 300 bootstrap copies of the same tree
Workflow 3 · Bagging + random feature subsets
Test f1 · CV mean ± std underneath
Interpretation · two importances, two stories
06
L12 returns to ensembles: boosting swaps averaging for sequential correction.
Even more randomness · ExtraTreesClassifier
Probabilities · CalibratedClassifierCV
Imbalance · class_weight='balanced_subsample'
Interpretation · one feature's effect
Design question · depth vs breadth
Regression · many targets, one forest