Decision Tree — Explained

Definition

A decision tree starts at a root, and at each node picks a feature and threshold ("age > 35?") that splits the data into two subsets. Splitting recurses until subsets are pure enough or the tree reaches its maximum depth. The leaves carry the final prediction — a class label or a numeric value.

The split criterion is chosen greedily: in classification to maximize the drop in Gini impurity or entropy; in regression to maximize the drop in variance. This greedy choice doesn't always yield the global optimum, but it's fast.

Trees shine because they're transparent. You can draw a trained tree on paper and trace exactly why customer X was scored as risky. The downside: a single tree overfits easily. In practice trees are usually used as ensembles — Random Forest, Gradient Boosting.

Analogy

Think of an ER triage flow. "Is the patient breathing?" → yes leads to the next question; no leads to immediate ICU. "Fever above 39°C?" → yes leads to the infection branch, no to other suspicions. The clinician routes the patient down a sub-flow at every question. A decision tree is exactly that flow, learned from data and then applied uniformly to every new case.

Real-world example

A bank wants to automate credit-decline decisions but transparency is required by law. Random forest would be more accurate, but "the model said so" doesn't fly. A single tree is trained with max_depth=5.

The root asks "is debt-to-income > 0.45?". Yes → next question is "any unpaid loans in the past 24 months?". Leaves carry "approve", "manual review", or "decline". The customer can be given the reason in writing: "debt/income 0.52 and one unpaid loan in 24 months → manual review." Explainability won; the last 2% of accuracy was traded away.

Code examples

scikit-learn · decision treePython

from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.model_selection import cross_val_score
import matplotlib.pyplot as plt

# max_depth is the most critical hyperparameter — fights overfit
tree = DecisionTreeClassifier(
    max_depth=5,
    min_samples_leaf=20,
    random_state=42,
)

scores = cross_val_score(tree, X, y, cv=5, scoring="f1_macro")
print(f"F1 (5-fold): {scores.mean():.3f} ± {scores.std():.3f}")

tree.fit(X, y)
plt.figure(figsize=(12, 8))
plot_tree(tree, feature_names=feature_names, class_names=class_names, filled=True)
plt.savefig("tree.png", dpi=150, bbox_inches="tight")

When to use

You must explain individual predictions
Quick baseline and data exploration
Mixed numeric/categorical features without wanting to scale
Some tolerance to missing values and outliers

When not to use

Accuracy is everything and you can afford an ensemble — pick Random Forest or Gradient Boosting
Severely imbalanced data without weighting
Tiny dataset where overfitting is hard to control

Common pitfalls

Unbounded depth

Without max_depth a tree memorizes training data and flops on test. Always set max_depth and min_samples_leaf.

Class imbalance

When the positive class is rare a tree drifts toward predicting the majority. Use class_weight='balanced' or resampling.

Trusting a single tree

Tiny changes in training data can flip the whole tree — high variance. For production use ensembles.