Linear Regression — Explained

Definition

Linear regression assumes the target can be written as a weighted sum of inputs: y = β₀ + β₁x₁ + β₂x₂ + ... + ε. Training finds the coefficients that minimize the sum of squared errors between the predictions and the actual values. The classic closed-form solution is ordinary least squares (OLS).

Its strength is simplicity and interpretability. Each coefficient tells you the effect of a one-unit change in that feature on the target — directly. The model is therefore both a predictor and an explanation, used everywhere from business analytics to epidemiology.

The linearity assumption is often restrictive; real-world relationships curve. Common workarounds include polynomial features, log transforms, and interaction terms. Beyond that, Ridge and Lasso regularization keep coefficients in check and fight overfitting.

Analogy

Like an economist saying "each extra year of schooling adds X to average salary." In their head: salary = constant + (years × coefficient). They derive the coefficient from thousands of past cases. The prediction isn't perfect, but it's directly interpretable: "two more years of school, salary on average rises by this much." Linear regression is the math version of that mental model.

Real-world example

A real estate firm builds a rent-prediction tool. Features: square meters, bedroom count, floor, building age, neighborhood, transit score. Target: monthly rent. Linear regression says: "78 TRY per square meter, 1,250 per bedroom, −35 TRY per year of building age, +4,000 TRY for neighborhood X."

Prediction quality isn't best-in-class (gradient boosting wins on raw accuracy), but the moment the realtors see those coefficients they validate the logic. "Is the neighborhood effect really that big?" — back to the data. Interpretability beats the last 5% of accuracy in many businesses.

Code examples

scikit-learn · linear regressionPython

from sklearn.linear_model import LinearRegression, Ridge
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

model = LinearRegression()
model.fit(X_train, y_train)

# Coefficients are directly interpretable
for name, coef in zip(feature_names, model.coef_):
    print(f"{name}: {coef:+.2f}")
print(f"Intercept: {model.intercept_:.2f}")

mae = mean_absolute_error(y_test, model.predict(X_test))
print(f"Test MAE: {mae:.2f}")

# Ridge for regularization — penalizes large coefficients
ridge = Ridge(alpha=1.0).fit(X_train, y_train)

When to use

Interpretability matters more than the last 5% of accuracy
The relationship is approximately linear (or made so via transforms)
You need a fast, transparent baseline
Statistical significance tests and confidence intervals are needed

When not to use

The relationship is clearly nonlinear
Outliers are heavy — squared error overweights them
Many highly correlated features (multicollinearity) — use Ridge/Lasso or a different model

Common pitfalls

Multicollinearity

Two features that are near-copies of each other make coefficients explode and become uninterpretable. Check the correlation matrix; drop one, or use Ridge.

Outlier sensitivity

OLS overweights outliers. A single bad point can tilt the entire fit. Use robust regression or clean outliers first.

Not testing the linearity assumption

You think the model fits — actually the relationship curves and you have systematic error. Plot residuals; if they're not random, linearity is violated.