Classification
Assigning a label from a fixed set
A supervised learning problem where the model picks one label from a finite set — for example 'spam vs. not spam'.
Classification is one of the two main families in supervised learning. The model is asked to pick one label from a predefined set for every input it sees. Three flavors based on label count: binary classification (spam/ham, malignant/benign), multi-class (dog/cat/ bird/fish), and multi-label (a movie can be both action and comedy).
The output is rarely a hard label; it's usually a probability distribution over classes. The model says "87% spam, 13% ham" and you turn that into a decision with a threshold. The threshold moves with the cost of mistakes: in cancer screening you keep it high to avoid false positives; in fraud detection you lower it to avoid missing fraud.
Classification is an old problem; logistic regression, decision trees, SVMs, k-NN, and neural networks all solve it. Choosing among them depends on dataset size, explainability needs, class imbalance, and so on.
Picture a postal sorter at a mail room. They glance at every envelope and toss it in the right slot: "Marketing", "Bills", "Important", "Junk". With the patterns they learned years ago, they sort dozens a second without thinking. Sometimes they hesitate — they're weighing probabilities, and that's exactly what your model produces. The threshold is the sorter's caution: when uncertain, send it to "Important" because the cost of misfiling that one is highest.
A news site wants to auto-flag hate speech in comments. Editors have hand-labeled 50,000 comments as "clean", "profanity", or "hate speech". A three-class classification problem.
The trained model outputs three probabilities per new comment. The product lead picks thresholds: above 40% "profanity" → send to moderator queue; above 25% "hate speech" → auto-hide. When a mistake surfaces, the lead tightens or relaxes a threshold. There is no single "safety band"; each class is tuned by its business cost.
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
# X: features, y: 0 (ham) or 1 (spam)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = LogisticRegression()
model.fit(X_train, y_train)
# Probability predictions
probs = model.predict_proba(X_test)[:, 1]
# Pick threshold by cost (default 0.5)
threshold = 0.4
preds = (probs > threshold).astype(int)
# Always look at precision/recall/F1, not just accuracy
print(classification_report(y_test, preds, target_names=["ham", "spam"]))- Output falls into a finite, discrete set of labels
- You have historical data with the correct answers
- A decision needs to be automated: archive, block, escalate
- Knowing per-class probabilities matters (for threshold tuning)
- Output is continuous numeric — use regression
- Thousands of classes with only a few examples each
- Labels are too expensive to collect — try unsupervised or semi-supervised
Only looking at accuracy
If 99 of 100 are 'ham', a model that always says 'ham' scores 99% — and finds zero spam. Precision, recall, F1, confusion matrix are non-negotiable.
Forgetting the threshold
Most libraries default to 0.5. In health, security, fraud, that's wrong. Compute the cost of false positives vs false negatives and tune accordingly.
Ignoring class imbalance
If the positive class is under 1%, a model can look 'good' by predicting negative. Use SMOTE, class weights, balanced sampling to fix the imbalance.