Linear Regression
Fitting a straight line
The most fundamental ML algorithm — models the relationship between numeric inputs and a numeric target as a weighted sum (a line, plane, or hyperplane).
Linear regression assumes the target can be written as a weighted sum of inputs: y = β₀ + β₁x₁ + β₂x₂ + ... + ε. Training finds the coefficients that minimize the sum of squared errors between the predictions and the actual values. The classic closed-form solution is ordinary least squares (OLS).
Its strength is simplicity and interpretability. Each coefficient tells you the effect of a one-unit change in that feature on the target — directly. The model is therefore both a predictor and an explanation, used everywhere from business analytics to epidemiology.
The linearity assumption is often restrictive; real-world relationships curve. Common workarounds include polynomial features, log transforms, and interaction terms. Beyond that, Ridge and Lasso regularization keep coefficients in check and fight overfitting.
Like an economist saying "each extra year of schooling adds X to average salary." In their head: salary = constant + (years × coefficient). They derive the coefficient from thousands of past cases. The prediction isn't perfect, but it's directly interpretable: "two more years of school, salary on average rises by this much." Linear regression is the math version of that mental model.
A real estate firm builds a rent-prediction tool. Features: square meters, bedroom count, floor, building age, neighborhood, transit score. Target: monthly rent. Linear regression says: "78 TRY per square meter, 1,250 per bedroom, −35 TRY per year of building age, +4,000 TRY for neighborhood X."
Prediction quality isn't best-in-class (gradient boosting wins on raw accuracy), but the moment the realtors see those coefficients they validate the logic. "Is the neighborhood effect really that big?" — back to the data. Interpretability beats the last 5% of accuracy in many businesses.
from sklearn.linear_model import LinearRegression, Ridge
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = LinearRegression()
model.fit(X_train, y_train)
# Coefficients are directly interpretable
for name, coef in zip(feature_names, model.coef_):
print(f"{name}: {coef:+.2f}")
print(f"Intercept: {model.intercept_:.2f}")
mae = mean_absolute_error(y_test, model.predict(X_test))
print(f"Test MAE: {mae:.2f}")
# Ridge for regularization — penalizes large coefficients
ridge = Ridge(alpha=1.0).fit(X_train, y_train)- Interpretability matters more than the last 5% of accuracy
- The relationship is approximately linear (or made so via transforms)
- You need a fast, transparent baseline
- Statistical significance tests and confidence intervals are needed
- The relationship is clearly nonlinear
- Outliers are heavy — squared error overweights them
- Many highly correlated features (multicollinearity) — use Ridge/Lasso or a different model
Multicollinearity
Two features that are near-copies of each other make coefficients explode and become uninterpretable. Check the correlation matrix; drop one, or use Ridge.
Outlier sensitivity
OLS overweights outliers. A single bad point can tilt the entire fit. Use robust regression or clean outliers first.
Not testing the linearity assumption
You think the model fits — actually the relationship curves and you have systematic error. Plot residuals; if they're not random, linearity is violated.