Forecasting
Predicting future values
Predicting future values from historical data — the dominant ML application for sales, traffic, demand, and price.
Forecasting is a branch of ML focused on projecting future values from a time series. It differs from plain regression because time is special: the future depends on the past, seasonality and trend are expected, train/test cannot be randomly split (would leak the future).
Three main approach families:
- Statistical models (ARIMA, ETS, SARIMA): autoregression, moving averages, seasonality decomposition. Strong baselines with little data, explainable. - Machine learning (gradient boosting, linear + lag features): feed past values plus exogenous features (weather, campaigns, holidays). Often the best cost/quality choice in practice. - Deep learning (LSTM, Temporal Fusion Transformer, N-BEATS): shines with long dependencies and many series in parallel. Expensive; usually overkill on small data.
A forecast should be a distribution, not just a point. Saying "tomorrow will be 1,250" is far less useful than "with 80% probability, between 1,100 and 1,400". Quantile regression, Prophet's intervals, and conformal prediction give you those bounds.
Like a sailor predicting weather. They check past temperature, wind, pressure; layer in seasonal patterns (summer warm, winter cold); read the trend; and look at the clouds on the horizon. "70% chance of rain tomorrow" — not a single answer, a probability. Forecasting is the math behind that intuition.
An e-commerce shop wants to forecast daily orders to size operations, dispatch riders, and plan inventory. Three models are compared:
- SARIMA: captures weekly and yearly seasonality. Test MAPE 8.2%. - LightGBM + lag features: orders at 7, 30, 365 days + holiday flag + campaign flag + weather. Test MAPE 6.4%. - Temporal Fusion Transformer: same features and more, attention for long range. Test MAPE 5.9% but 100× the compute.
LightGBM wins on cost/quality. Adding quantile regression yields 10/50/90 percentile forecasts; ops uses the 90th to plan worst-case capacity.
import pandas as pd
import lightgbm as lgb
from sklearn.metrics import mean_absolute_percentage_error
df = pd.read_csv("orders.csv", parse_dates=["date"]).sort_values("date")
# Lag features (no leakage from the future!)
for lag in [1, 7, 14, 30, 365]:
df[f"lag_{lag}"] = df["orders"].shift(lag)
df["dow"] = df["date"].dt.dayofweek
df["month"] = df["date"].dt.month
df = df.dropna()
# Time-based split — NEVER random
cutoff = "2025-09-01"
train = df[df["date"] < cutoff]
test = df[df["date"] >= cutoff]
features = [c for c in df.columns if c not in ["date", "orders"]]
model = lgb.LGBMRegressor(n_estimators=500, learning_rate=0.05)
model.fit(train[features], train["orders"])
preds = model.predict(test[features])
mape = mean_absolute_percentage_error(test["orders"], preds)
print(f"Test MAPE: {mape:.3f}")from prophet import Prophet
import pandas as pd
df = pd.DataFrame({"ds": dates, "y": orders})
m = Prophet(
yearly_seasonality=True,
weekly_seasonality=True,
daily_seasonality=False,
seasonality_mode="multiplicative",
)
m.add_country_holidays(country_name="TR")
m.fit(df)
future = m.make_future_dataframe(periods=30)
forecast = m.predict(future)
# yhat: point forecast, yhat_lower/upper: 80% interval
print(forecast[["ds", "yhat", "yhat_lower", "yhat_upper"]].tail(10))- Demand forecasting for ops, inventory, staffing
- Financial planning: revenue, spend, cash flow
- Post-A/B-test extrapolation
- Anomaly detection — surprise vs expectation
- Past doesn't resemble the future (post-structural break, e.g. pandemic onset)
- Very short series — too little signal
- One-off decisions — model setup is overkill
Data leakage
A feature from the future leaks into training (e.g. 'orders later this week'). Lag-feature engineering needs careful guards. Every variable must be knowable at forecast time.
Random cross-validation
Plain k-fold uses the past to validate the future → inflated scores. Use TimeSeriesSplit, rolling origin.
Reporting only point forecasts
Decisions need uncertainty. Use quantile regression, conformal prediction, MC dropout to produce intervals.