Time Series
Ordered observations over time
A sequence of observations indexed by time — daily sales, hourly temperature, millisecond heartbeats. The order matters and demands special analysis.
A time series is a sequence of observations recorded at (usually) regular time intervals. Unlike plain tabular data, order is meaningful: the value at time t is likely related to t-1. That structure brings both richness (seasonality, trend, autocorrelation) and constraints (no random splits).
Standard components: trend (long-term direction), seasonality (recurring weekly/monthly/yearly pattern), cyclical (multi-year business cycles), noise (random fluctuation). Decomposing the series into these is the starting point of analysis.
Time series can be stationary or non-stationary. Stationary preserves its statistics over time; non-stationary has trend or changing variance. Classical statistical models (ARIMA) require stationarity, often achieved via differencing, log transforms, or seasonal removal.
Modern ML approaches (gradient boosting, deep learning) don't require stationarity. You build a tabular dataset enriched with lag features, calendar features, exogenous variables, and treat it like a regression problem.
A heartbeat trace. A single value tells you little; the rhythm over the previous minutes, the wave shape, any irregularity — those tell a doctor a lot. Order carries information. Shuffle the data and information is destroyed. Time-series analysis works the same way.
A shop's three years of daily orders are a time series. Visible patterns:
1. Trend: ~18% annual growth. 2. Yearly seasonality: December peak (holidays), Jan/Feb trough. 3. Weekly seasonality: Mon/Tue slightly down, Wed/Thu normal, Fri-Sun peaks. 4. Event effects: Black Friday, religious holidays, Valentine's Day spikes.
Decompose, then model each component: trend with linear/ exponential, seasonality with Fourier or dummies, holidays with indicator variables. Prophet does this automatically; a LightGBM model gets these via lag + calendar + holiday features you build by hand.
import pandas as pd
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.stattools import adfuller
import matplotlib.pyplot as plt
df = pd.read_csv("orders.csv", parse_dates=["date"], index_col="date")
df["orders"].plot(figsize=(12, 4), title="Daily orders")
plt.savefig("series.png")
result = seasonal_decompose(df["orders"], model="additive", period=7)
result.plot()
plt.savefig("decomposition.png")
# Stationarity test (ADF)
stat, pvalue, *_ = adfuller(df["orders"].dropna())
print(f"ADF p-value: {pvalue:.4f}")
# p < 0.05 → stationary
# p > 0.05 → non-stationary; differencing may help
df["rolling_mean"] = df["orders"].rolling(30).mean()
df["rolling_std"] = df["orders"].rolling(30).std()import pandas as pd
df = pd.read_csv("orders.csv", parse_dates=["date"]).sort_values("date")
for lag in [1, 7, 14, 30]:
df[f"lag_{lag}"] = df["orders"].shift(lag)
df["dow"] = df["date"].dt.dayofweek
df["month"] = df["date"].dt.month
df["is_weekend"] = df["dow"].isin([5, 6]).astype(int)
df["day_of_year"] = df["date"].dt.dayofyear
df["ma_7"] = df["orders"].rolling(7).mean()
df["ma_30"] = df["orders"].rolling(30).mean()- Timestamped, order-meaningful data
- Demand, finance, sensor telemetry
- Anomaly detection — deviations from expectation
- Forecasting backbone
- Tabular data with no temporal dimension
- Very short series — too little structure
- Highly irregular timestamps — resample first
Random splitting
Random train/test puts the future in training and the past in validation. Inflated scores, broken production. Always chronological splits.
Missing dates
Daily data should have a row per day. Missing days break lag features. Resample('D') and fill gaps explicitly.
One model fits all
1000 SKUs each with their own series — one model vs 1000 models is rarely optimal. Look at hierarchical / global modeling. Each has tradeoffs.