Bias-Variance Trade-off
The decomposition of error
A framework that splits a model's error into bias (too simple), variance (too unstable), and irreducible noise — with a natural trade-off.
A model's true error decomposes into three pieces: bias (systematic miss — the model can't capture the true relationship), variance (how much predictions swing when training data changes — instability), and irreducible error (noise inherent to the data; no model can fix it). Error = bias² + variance + noise.
A natural tension lives among the three. Make the model more complex (deeper tree, more parameters) and bias drops while variance rises. Simplify and the reverse happens. The "right complexity" minimizes their sum.
Diagnosis is simple in practice. High bias signs: low train and low validation accuracy, both close — underfitting. High variance signs: high train accuracy, much lower validation — overfitting. The fix differs: for bias, give the model more capacity; for variance, regularize or add data.
The dartboard cliché. High bias, low variance: every dart lands in the same off-center spot — systematic miss. Low bias, high variance: darts scattered everywhere, the mean near center but no single throw reliable. Low bias, low variance: tight cluster on bullseye — ideal. High both: scattered and off — the model's useless. Picking a model is managing this trade-off.
On a click-through prediction task, three models are compared:
1. Logistic regression (high bias): train AUC 0.71, val 0.70. Errors close, systematic underperformance — can't capture feature interactions.
2. Oversized random forest (high variance): train AUC 0.99, val 0.79. Memorized training data, generalization weak.
3. Tuned LightGBM (balanced): train 0.86, val 0.84. Learns and generalizes.
Ship the third. Fix model 1 by adding feature interactions; fix model 2 by reducing depth, lowering learning rate, adding regularization. The bias-variance lens tells you what kind of intervention is needed.
from sklearn.model_selection import cross_validate
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
models = {
"logistic (simple)": LogisticRegression(max_iter=1000),
"rf shallow": RandomForestClassifier(max_depth=4, n_estimators=200),
"rf deep": RandomForestClassifier(max_depth=None, n_estimators=200),
}
for name, m in models.items():
cv = cross_validate(m, X, y, cv=5, scoring="f1_macro", return_train_score=True)
train = cv["train_score"].mean()
val = cv["test_score"].mean()
gap = train - val
print(f"{name:20s} train={train:.3f} val={val:.3f} gap={gap:+.3f}")
# large gap → variance problem
# also low train → bias problem- Diagnostic framework — use it on every model evaluation
- Diagnostic framework — keep it in mind always
Confusing bias with variance
High training accuracy says bias is low — but variance might still be killing you. Measure both before deciding.
One tool for both problems
Regularization lowers variance but slightly raises bias. Each problem has its own remedy — don't apply variance fixes to bias issues.
Thinking more data fixes everything
More data lowers variance, barely touches bias. If the model architecture is wrong, no amount of data saves it.