Forecasting — Explained

Definition

Forecasting is a branch of ML focused on projecting future values from a time series. It differs from plain regression because time is special: the future depends on the past, seasonality and trend are expected, train/test cannot be randomly split (would leak the future).

Three main approach families:

- Statistical models (ARIMA, ETS, SARIMA): autoregression, moving averages, seasonality decomposition. Strong baselines with little data, explainable. - Machine learning (gradient boosting, linear + lag features): feed past values plus exogenous features (weather, campaigns, holidays). Often the best cost/quality choice in practice. - Deep learning (LSTM, Temporal Fusion Transformer, N-BEATS): shines with long dependencies and many series in parallel. Expensive; usually overkill on small data.

A forecast should be a distribution, not just a point. Saying "tomorrow will be 1,250" is far less useful than "with 80% probability, between 1,100 and 1,400". Quantile regression, Prophet's intervals, and conformal prediction give you those bounds.

Real-world example

An e-commerce shop wants to forecast daily orders to size operations, dispatch riders, and plan inventory. Three models are compared:

- SARIMA: captures weekly and yearly seasonality. Test MAPE 8.2%. - LightGBM + lag features: orders at 7, 30, 365 days + holiday flag + campaign flag + weather. Test MAPE 6.4%. - Temporal Fusion Transformer: same features and more, attention for long range. Test MAPE 5.9% but 100× the compute.

LightGBM wins on cost/quality. Adding quantile regression yields 10/50/90 percentile forecasts; ops uses the 90th to plan worst-case capacity.

Code examples

LightGBM + lag featuresPython

import pandas as pd
import lightgbm as lgb
from sklearn.metrics import mean_absolute_percentage_error

df = pd.read_csv("orders.csv", parse_dates=["date"]).sort_values("date")

# Lag features (no leakage from the future!)
for lag in [1, 7, 14, 30, 365]:
    df[f"lag_{lag}"] = df["orders"].shift(lag)
df["dow"] = df["date"].dt.dayofweek
df["month"] = df["date"].dt.month
df = df.dropna()

# Time-based split — NEVER random
cutoff = "2025-09-01"
train = df[df["date"] < cutoff]
test = df[df["date"] >= cutoff]

features = [c for c in df.columns if c not in ["date", "orders"]]
model = lgb.LGBMRegressor(n_estimators=500, learning_rate=0.05)
model.fit(train[features], train["orders"])

preds = model.predict(test[features])
mape = mean_absolute_percentage_error(test["orders"], preds)
print(f"Test MAPE: {mape:.3f}")

Prophet · seasonality + holidaysPython

from prophet import Prophet
import pandas as pd

df = pd.DataFrame({"ds": dates, "y": orders})

m = Prophet(
    yearly_seasonality=True,
    weekly_seasonality=True,
    daily_seasonality=False,
    seasonality_mode="multiplicative",
)
m.add_country_holidays(country_name="TR")
m.fit(df)

future = m.make_future_dataframe(periods=30)
forecast = m.predict(future)

# yhat: point forecast, yhat_lower/upper: 80% interval
print(forecast[["ds", "yhat", "yhat_lower", "yhat_upper"]].tail(10))

Common pitfalls

Data leakage

A feature from the future leaks into training (e.g. 'orders later this week'). Lag-feature engineering needs careful guards. Every variable must be knowable at forecast time.

Random cross-validation

Plain k-fold uses the past to validate the future → inflated scores. Use TimeSeriesSplit, rolling origin.

Reporting only point forecasts

Decisions need uncertainty. Use quantile regression, conformal prediction, MC dropout to produce intervals.