Practical Forecasting Guide
ARIMA, Prophet, LightGBM, or deep learning? A practical guide to choosing the right approach for a forecasting problem.
Establish a baseline first
The most common mistake is to start a forecasting project with "let's use LSTM". Establish a simple baseline first:
- Naive: "tomorrow equals today."
- Seasonal naive: "next Monday looks like last Monday."
- Moving average: "average of the last 7 days."
These often embarrass you with how well they perform. Whatever ARIMA / LightGBM / LSTM gains, measure against the baseline. "5% MAPE" is meaningless; "naive baseline 12%, my model 5%, 58% improvement" tells the story.
import pandas as pd
from sklearn.metrics import mean_absolute_percentage_error
df = pd.read_csv("orders.csv", parse_dates=["date"]).sort_values("date")
# Naive: tomorrow = today
naive = df["orders"].shift(1).dropna()
actual = df["orders"][1:]
print(f"Naive MAPE: {mean_absolute_percentage_error(actual, naive):.3f}")
# Seasonal naive: this Mon = last Mon
seasonal_naive = df["orders"].shift(7).dropna()
actual = df["orders"][7:]
print(f"Seasonal naive MAPE: {mean_absolute_percentage_error(actual, seasonal_naive):.3f}")
# These are the numbers to beat.Which model when?
ARIMA / SARIMA
The classical statistician's choice. Decomposes trend and seasonality, autoregresses for forecast.
When: single series, small to medium data (1k-10k points), explainability matters, few or no exogenous variables. When not: many series (1000 SKUs Ć ARIMA = a maintenance nightmare), high frequency, lots of complex exogenous variables.
Prophet
Facebook's open-source library. Trend + seasonality + holidays out of the box. Quick wins.
from prophet import Prophet
import pandas as pd
df = pd.DataFrame({"ds": dates, "y": orders})
m = Prophet(
yearly_seasonality=True,
weekly_seasonality=True,
seasonality_mode="multiplicative",
)
m.add_country_holidays(country_name="TR")
m.fit(df)
future = m.make_future_dataframe(periods=30)
forecast = m.predict(future)
print(forecast[["ds", "yhat", "yhat_lower", "yhat_upper"]].tail(10))When: clear yearly + weekly seasonality, holidays matter, you want quick results. When not: many or complex exogenous effects (weather, marketing budget, price).
add_regressorworks but LightGBM dominates here.
LightGBM + lag features (the modern default)
Tabular ML adapted to time series. Lags as features, calendar features, exogenous variables. Best cost/quality in most production cases.
import pandas as pd
import lightgbm as lgb
from sklearn.metrics import mean_absolute_percentage_error
df = pd.read_csv("orders.csv", parse_dates=["date"]).sort_values("date")
for lag in [1, 7, 14, 30, 365]:
df[f"lag_{lag}"] = df["orders"].shift(lag)
df["ma_7"] = df["orders"].rolling(7).mean()
df["ma_30"] = df["orders"].rolling(30).mean()
df["dow"] = df["date"].dt.dayofweek
df["month"] = df["date"].dt.month
df["is_weekend"] = df["dow"].isin([5, 6]).astype(int)
df["weather_temp"] = ...
df["campaign_active"] = ...
df = df.dropna()
cutoff = "2025-09-01"
train = df[df["date"] < cutoff]
test = df[df["date"] >= cutoff]
features = [c for c in df.columns if c not in ["date", "orders"]]
model = lgb.LGBMRegressor(
n_estimators=1000,
learning_rate=0.05,
num_leaves=63,
min_child_samples=20,
)
model.fit(
train[features], train["orders"],
eval_set=[(test[features], test["orders"])],
callbacks=[lgb.early_stopping(50)],
)
preds = model.predict(test[features])
mape = mean_absolute_percentage_error(test["orders"], preds)
print(f"Test MAPE: {mape:.3f}")
# Feature importance ā what's driving things?
import pandas as pd
imp = pd.Series(model.feature_importances_, index=features).sort_values(ascending=False)
print(imp.head(10))When: rich exogenous variables, many series (thousands of SKUs), complex interactions, accuracy first. When not: very short series (<100 points) ā overfit risk; mandatory explainability.
Deep learning (LSTM, TFT, N-BEATS)
When long dependencies, many parallel series, and complex multivariate interactions matter.
When: thousands of series, long-range patterns (e.g. hourly energy), shared learning across series. When not: small data, mandatory explainability, no deep-learning experience on the team. LightGBM usually matches at 10Ć lower cost.
Correct cross-validation
Plain k-fold puts the future in training ā inflated scores. Use TimeSeriesSplit or rolling forecasting origin:
from sklearn.model_selection import TimeSeriesSplit
tss = TimeSeriesSplit(n_splits=5, test_size=30)
for fold, (train_idx, test_idx) in enumerate(tss.split(df)):
train_data = df.iloc[train_idx]
test_data = df.iloc[test_idx]
print(f"Fold {fold}: train {train_data['date'].min()}ā{train_data['date'].max()}, "
f"test {test_data['date'].min()}ā{test_data['date'].max()}")Each fold trains on the past, validates on the future. Error rates reflect production reality.
Forecast intervals (uncertainty)
Decisions are made on intervals, not points. "Tomorrow exactly 1,250 sales" is far less useful than "80% probability between 1,100 and 1,400".
Quantile regression
LightGBM trains separate quantile models:
import lightgbm as lgb
quantiles = {0.1: None, 0.5: None, 0.9: None}
for q in quantiles:
quantiles[q] = lgb.LGBMRegressor(
objective="quantile",
alpha=q,
n_estimators=500,
learning_rate=0.05,
)
quantiles[q].fit(train[features], train["orders"])
preds_low = quantiles[0.1].predict(test[features])
preds_med = quantiles[0.5].predict(test[features])
preds_high = quantiles[0.9].predict(test[features])
# 80% probability the actual value is in [low, high]Conformal prediction
A modern method with mathematical guarantees: works on any underlying predictor; "with 95% confidence the actual value lies in this interval".
Common mistakes
Data leakage
Lag features are the most common leak. Before adding "today's orders" as a feature for "tomorrow", ask: at the moment I want to forecast, would this feature actually be available? If today's orders aren't done yet, no.
A subtle case: "this week's running total" is a leak if the week is still in progress at forecast time.
Drifting seasonality
Black Friday's date shifts each year. "Same date last year" misleads. Tie holidays to logical events ("N days before/after Black Friday"), not absolute dates.
Structural breaks
Pandemic onset, product launch, price change ā moments where the past doesn't predict the future. Retrain post-break; consider excluding pre-break data.
Outliers
A 10Ć spike (system glitch, data load error) wrecks the model. Unless event-driven, clean or winsorize.
Going to production
Backtest before deployment
Test the model on each of the last 12 months ā is performance consistent, or has it degraded recently? If concept drift is visible, raise retrain frequency.
Retrain strategy
- Full retrain: weekly/monthly on the entire dataset
- Online update: refresh with the last k weeks
- Triggered retrain: kick off when recent error crosses a threshold
Most production systems use weekly + triggered.
Monitoring
Forecast quality is a distribution, not one number. Track:
- MAPE overall
- Residual distribution (histogram)
- Performance per season (Black Friday, holidays separately)
- Worst-error days
Quick decision table
| Situation | First pick |
|---|---|
| One series, small, seasonal | Prophet |
| One series, small, no exogenous | ARIMA / SARIMA |
| Many series, rich exogenous | LightGBM + lag |
| Many series, long dependencies | TFT or LightGBM |
| Hourly/minute high frequency | LightGBM (simple) or DeepAR |
| One-off analysis | Prophet (fast result) |
| Production at scale | LightGBM + reranker |
Continue reading
- Forecasting ā the concept and where it applies.
- Time Series ā the structural properties of ordered data.
- Cross-Validation ā how to do CV correctly on time-ordered data.
- Gradient Boosting ā the workhorse of practical forecasting.
- Anomaly Detection ā measuring the gap between expectation and reality.