📈GUIDE

Practical Forecasting Guide

ARIMA, Prophet, LightGBM, or deep learning? A practical guide to choosing the right approach for a forecasting problem.

ForecastingTime SeriesPractical

Establish a baseline first

The most common mistake is to start a forecasting project with "let's use LSTM". Establish a simple baseline first:

Naive: "tomorrow equals today."
Seasonal naive: "next Monday looks like last Monday."
Moving average: "average of the last 7 days."

These often embarrass you with how well they perform. Whatever ARIMA / LightGBM / LSTM gains, measure against the baseline. "5% MAPE" is meaningless; "naive baseline 12%, my model 5%, 58% improvement" tells the story.

import pandas as pd
from sklearn.metrics import mean_absolute_percentage_error

df = pd.read_csv("orders.csv", parse_dates=["date"]).sort_values("date")

# Naive: tomorrow = today
naive = df["orders"].shift(1).dropna()
actual = df["orders"][1:]
print(f"Naive MAPE: {mean_absolute_percentage_error(actual, naive):.3f}")

# Seasonal naive: this Mon = last Mon
seasonal_naive = df["orders"].shift(7).dropna()
actual = df["orders"][7:]
print(f"Seasonal naive MAPE: {mean_absolute_percentage_error(actual, seasonal_naive):.3f}")

# These are the numbers to beat.

Which model when?

ARIMA / SARIMA

The classical statistician's choice. Decomposes trend and seasonality, autoregresses for forecast.

When: single series, small to medium data (1k-10k points), explainability matters, few or no exogenous variables. When not: many series (1000 SKUs × ARIMA = a maintenance nightmare), high frequency, lots of complex exogenous variables.

Prophet

Facebook's open-source library. Trend + seasonality + holidays out of the box. Quick wins.

from prophet import Prophet
import pandas as pd

df = pd.DataFrame({"ds": dates, "y": orders})

m = Prophet(
    yearly_seasonality=True,
    weekly_seasonality=True,
    seasonality_mode="multiplicative",
)
m.add_country_holidays(country_name="TR")
m.fit(df)

future = m.make_future_dataframe(periods=30)
forecast = m.predict(future)

print(forecast[["ds", "yhat", "yhat_lower", "yhat_upper"]].tail(10))

When: clear yearly + weekly seasonality, holidays matter, you want quick results. When not: many or complex exogenous effects (weather, marketing budget, price). add_regressor works but LightGBM dominates here.

LightGBM + lag features (the modern default)

Tabular ML adapted to time series. Lags as features, calendar features, exogenous variables. Best cost/quality in most production cases.

import pandas as pd
import lightgbm as lgb
from sklearn.metrics import mean_absolute_percentage_error

df = pd.read_csv("orders.csv", parse_dates=["date"]).sort_values("date")

for lag in [1, 7, 14, 30, 365]:
    df[f"lag_{lag}"] = df["orders"].shift(lag)

df["ma_7"] = df["orders"].rolling(7).mean()
df["ma_30"] = df["orders"].rolling(30).mean()

df["dow"] = df["date"].dt.dayofweek
df["month"] = df["date"].dt.month
df["is_weekend"] = df["dow"].isin([5, 6]).astype(int)

df["weather_temp"] = ...
df["campaign_active"] = ...

df = df.dropna()

cutoff = "2025-09-01"
train = df[df["date"] < cutoff]
test = df[df["date"] >= cutoff]

features = [c for c in df.columns if c not in ["date", "orders"]]

model = lgb.LGBMRegressor(
    n_estimators=1000,
    learning_rate=0.05,
    num_leaves=63,
    min_child_samples=20,
)
model.fit(
    train[features], train["orders"],
    eval_set=[(test[features], test["orders"])],
    callbacks=[lgb.early_stopping(50)],
)

preds = model.predict(test[features])
mape = mean_absolute_percentage_error(test["orders"], preds)
print(f"Test MAPE: {mape:.3f}")

# Feature importance — what's driving things?
import pandas as pd
imp = pd.Series(model.feature_importances_, index=features).sort_values(ascending=False)
print(imp.head(10))

When: rich exogenous variables, many series (thousands of SKUs), complex interactions, accuracy first. When not: very short series (<100 points) — overfit risk; mandatory explainability.

Deep learning (LSTM, TFT, N-BEATS)

When long dependencies, many parallel series, and complex multivariate interactions matter.

When: thousands of series, long-range patterns (e.g. hourly energy), shared learning across series. When not: small data, mandatory explainability, no deep-learning experience on the team. LightGBM usually matches at 10× lower cost.

Correct cross-validation

Plain k-fold puts the future in training → inflated scores. Use TimeSeriesSplit or rolling forecasting origin:

from sklearn.model_selection import TimeSeriesSplit

tss = TimeSeriesSplit(n_splits=5, test_size=30)

for fold, (train_idx, test_idx) in enumerate(tss.split(df)):
    train_data = df.iloc[train_idx]
    test_data = df.iloc[test_idx]
    print(f"Fold {fold}: train {train_data['date'].min()}–{train_data['date'].max()}, "
          f"test {test_data['date'].min()}–{test_data['date'].max()}")

Each fold trains on the past, validates on the future. Error rates reflect production reality.

Forecast intervals (uncertainty)

Decisions are made on intervals, not points. "Tomorrow exactly 1,250 sales" is far less useful than "80% probability between 1,100 and 1,400".

Quantile regression

LightGBM trains separate quantile models:

import lightgbm as lgb

quantiles = {0.1: None, 0.5: None, 0.9: None}

for q in quantiles:
    quantiles[q] = lgb.LGBMRegressor(
        objective="quantile",
        alpha=q,
        n_estimators=500,
        learning_rate=0.05,
    )
    quantiles[q].fit(train[features], train["orders"])

preds_low  = quantiles[0.1].predict(test[features])
preds_med  = quantiles[0.5].predict(test[features])
preds_high = quantiles[0.9].predict(test[features])

# 80% probability the actual value is in [low, high]

Conformal prediction

A modern method with mathematical guarantees: works on any underlying predictor; "with 95% confidence the actual value lies in this interval".

Common mistakes

Data leakage

Lag features are the most common leak. Before adding "today's orders" as a feature for "tomorrow", ask: at the moment I want to forecast, would this feature actually be available? If today's orders aren't done yet, no.

A subtle case: "this week's running total" is a leak if the week is still in progress at forecast time.

Drifting seasonality

Black Friday's date shifts each year. "Same date last year" misleads. Tie holidays to logical events ("N days before/after Black Friday"), not absolute dates.

Structural breaks

Pandemic onset, product launch, price change — moments where the past doesn't predict the future. Retrain post-break; consider excluding pre-break data.

Outliers

A 10× spike (system glitch, data load error) wrecks the model. Unless event-driven, clean or winsorize.

Going to production

Backtest before deployment

Test the model on each of the last 12 months — is performance consistent, or has it degraded recently? If concept drift is visible, raise retrain frequency.

Retrain strategy

Full retrain: weekly/monthly on the entire dataset
Online update: refresh with the last k weeks
Triggered retrain: kick off when recent error crosses a threshold

Most production systems use weekly + triggered.

Monitoring

Forecast quality is a distribution, not one number. Track:

MAPE overall
Residual distribution (histogram)
Performance per season (Black Friday, holidays separately)
Worst-error days

Quick decision table

Situation	First pick
One series, small, seasonal	Prophet
One series, small, no exogenous	ARIMA / SARIMA
Many series, rich exogenous	LightGBM + lag
Many series, long dependencies	TFT or LightGBM
Hourly/minute high frequency	LightGBM (simple) or DeepAR
One-off analysis	Prophet (fast result)
Production at scale	LightGBM + reranker

Continue reading

Forecasting — the concept and where it applies.
Time Series — the structural properties of ordered data.
Cross-Validation — how to do CV correctly on time-ordered data.
Gradient Boosting — the workhorse of practical forecasting.
Anomaly Detection — measuring the gap between expectation and reality.