Beginner· ~2 min read#logistic-regression#classification#probability

Logistic Regression

Linear classifier with calibrated probabilities

Despite the name, it's a classification algorithm — it models the probability of a class via the sigmoid function on a linear combination of features.

Definition

Logistic regression takes the unbounded numeric output of linear regression and squashes it through the sigmoid (logistic) function into a 0–1 probability — say, the probability of "spam". A threshold (often 0.5) turns it into a binary decision. The multinomial version uses softmax to do the same on N classes.

Training optimizes log-likelihood (equivalently, minimizes cross-entropy) instead of squared error — which gives much better calibrated probabilities.

Logistic regression is the workhorse of every domain that demands explainability: credit scoring, insurance risk, clinical decision support. Coefficients are interpreted via odds ratios — "a one-unit increase in this feature multiplies the odds of the event by exp(β)."

Analogy

Like a doctor reasoning "is this patient at high risk of a heart attack?" — weighing age, blood pressure, cholesterol, smoking, family history, each with its own mental weight. They sum them and produce a probability. Below 20% they relax; above, they order more tests. Logistic regression is the math of that reasoning — it learns the weights from data, the door to interpretation stays open for humans.

Real-world example

An e-commerce site wants to send discount emails to cart abandoners. The model output: "probability this user comes back to complete the purchase within 24 hours". Low probability → send the email; high → skip (they'd return anyway, the discount is wasted).

Logistic regression turns 18 features (device, browse depth, cart value, prior orders…) into a probability. A threshold of 0.35 selects the email cohort. Showing coefficients to marketing yields lines like "having shopped premium multiplies the return-odds by 2.4" — the model produces insight, not just predictions.

Code examples

scikit-learn · logistic regressionPython

from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import roc_auc_score
import numpy as np

pipe = Pipeline([
    ("scaler", StandardScaler()),
    ("clf", LogisticRegression(max_iter=500, C=1.0)),
])

pipe.fit(X_train, y_train)

probs = pipe.predict_proba(X_test)[:, 1]

# ROC-AUC evaluates the model independent of the threshold
print(f"ROC-AUC: {roc_auc_score(y_test, probs):.3f}")

# Odds ratio interpretation
coefs = pipe.named_steps["clf"].coef_[0]
for name, c in zip(feature_names, coefs):
    print(f"{name}: coef={c:+.3f}, odds ratio={np.exp(c):.3f}")

When to use

Fast, explainable baseline for binary classification
Calibrated probabilities matter, not just the predicted class
Regulated/corporate environments where decisions must be defended
Roughly linearly separable data

When not to use

The class boundary is clearly nonlinear — try trees or kernel SVM
Very high-dimensional, complex relationships and accuracy is paramount
Heavily imbalanced classes without class weighting

Common pitfalls

Skipping feature scaling

Logistic regression with a regularizer is sensitive to feature scale; without scaling, regularization unfairly penalizes some features and convergence is slow. Always StandardScaler.

Leaving threshold at 0.5

On imbalanced data 0.5 is rarely right. If the positive class is 5%, the optimal threshold is usually much lower. Pick from the ROC and precision-recall curves.

Trusting linear separability

Logistic regression draws a linear boundary. If groups intertwine, it fails. Always validate the assumption.