Time Series Analysis

Most statistical methods assume your observations are independent — that the order doesn't matter. Time series data is the opposite: it's a sequence of measurements through time (a stock price, monthly rainfall, daily case counts), and each point is intimately connected to the ones around it. Yesterday tells you a lot about today. That temporal dependence is both the challenge and the signal, and it needs its own discipline.

I worked with time series at CSIRO, modelling how the El Niño–Southern Oscillation links to commodity volatility and risk. This page is the practical core: how to decompose a series, make it analysable, model it, and — the part people get wrong most — forecast and evaluate it honestly.

When order matters

The defining feature of time series is temporal dependence: a value is correlated with its own past (this is autocorrelation, below). That single fact breaks the independence assumption behind the statistics and regression pages — you can't just shuffle the rows, and a naïve model will badly understate its own uncertainty.

The goals are also distinct. Sometimes you want to understand the structure (what's the trend, is there a cycle?); usually you want to forecast — predict future values from the past. Both start the same way: pull the series apart into the patterns hiding inside it.

Trend, seasonality, noise

The foundational move is decomposition — separating a series into three interpretable parts:

Trend — the long-term direction (sales growing over years, a warming baseline).
Seasonality — patterns that repeat on a fixed period (higher retail every December, daily traffic peaks, an annual climate cycle).
Residual / noise — what's left once trend and seasonality are removed; the irregular part, ideally random.

Decomposition is the first thing to do with any series, because it makes the structure visible and tells you what you're dealing with. A forecast is, in essence, projecting the trend and seasonality forward and being honest about the noise.

Decomposition. A raw series is the sum of a slow trend, a repeating seasonal cycle, and irregular noise. Pulling them apart is the first step in understanding — and forecasting — any time series.

Stationarity

The central technical concept is stationarity: a series is stationary if its statistical properties — mean, variance — don't change over time. Most classical methods require it, because you can't reliably model a moving target. A series with a trend or growing variance is non-stationary and must be tamed first.

The standard fix is differencing — model the change from one step to the next rather than the raw level, which removes a trend. You test for stationarity formally (the ADF test) rather than eyeballing it.

Autocorrelation

Time series has its own diagnostic: autocorrelation — the correlation of the series with a lagged copy of itself. "How related is today to 7 days ago?" The ACF (autocorrelation function) and PACF (partial autocorrelation function) plots are the read-out, and they're how you both detect structure (a spike at lag 12 screams yearly seasonality in monthly data) and choose model parameters.

Reading ACF/PACF is a core skill: the shape of these plots tells you how many past terms a model needs. It's the time-series analyst's equivalent of the residual plot — the picture that tells you what the data is doing.

AR, MA, and ARIMA

The classic workhorse family combines three simple ideas, and the whole thing is captured by the name ARIMA( $p, d, q$ ):

AR (AutoRegressive, order $p$ ) — predict the value from its own recent values. Today is a weighted sum of the last $p$ days.
I (Integrated, order $d$ ) — the number of times you differenced to reach stationarity.
MA (Moving Average, order $q$ ) — predict from the recent forecast errors, smoothing out shocks.

An AR(p) model, the most intuitive piece, is just a regression on the past:

X_t = c + \sum_{i=1}^{p} \varphi_i\, X_{t-i} + \varepsilon_t

You pick $(p, d, q)$ from the ACF/PACF plots and information criteria (AIC/BIC again — fit vs complexity), fit by maximum likelihood, and — crucially — check the residuals: if anything is left in them, the model missed structure and the forecast will be biased. Residual diagnostics are non-negotiable.

Seasonality

When the data has a repeating cycle — and climate, retail, and operational data almost always do — you extend to SARIMA, which adds seasonal AR, MA, and differencing terms at the seasonal lag (12 for monthly-yearly data, 7 for daily-weekly). The trap is misidentifying the period: assuming the wrong cycle length wrecks the model. Seasonal subseries plots and the ACF (a spike at the seasonal lag) are how you pin it down rather than guess.

Forecasting honestly

This is where time series most often goes wrong, and the mistake is subtle: you cannot evaluate a forecast with an ordinary random train/test split. Shuffling rows lets the model peek at the future to predict the past — a leak that flatters the score and lies about real performance.

Instead you split in time: train on the past, test on the future it never saw. Better still is backtesting with a rolling origin — repeatedly train up to a point and forecast the next stretch, sliding forward — which shows how the model performs across many periods, not one lucky window. And be honest about horizon: forecasts decay the further out you go, so a one-step-ahead score says nothing about a twelve-step forecast.

Modern approaches

ARIMA is the foundation, but the toolkit has grown. Exponential smoothing (ETS) is a simple, robust classical alternative. Prophet handles multiple seasonalities and holidays with little tuning. And machine-learning and deep-learning models (gradient boosting on lag features, LSTMs, transformers) can capture complex non-linear patterns when you have enough data — though for many real problems a well-fitted ARIMA or ETS is still hard to beat, and far easier to explain. As ever: the simplest model that does the job.

Where it shows up in my work

From climate signals to operational forecasts

I worked with time series at CSIRO, building autoregressive models that linked the El Niño–Southern Oscillation to commodity volatility and conflict risk — exactly this discipline: decompose the signal, handle the seasonality and non-stationarity, model the temporal structure, and be honest about how far ahead the forecast can reach. The lesson that stuck is the one most people skip — evaluate in time, never on a shuffled split, and quote the uncertainty.

It generalises straight to government work: anything measured over time — case volumes, demand, operational metrics — is a forecasting problem, and the same rigour (stationarity, backtesting, widening intervals) is what separates a forecast a decision-maker can trust from a confident-looking line that misleads.

Refresh in 60 seconds

Forecast-evaluation and ARIMA guidance on this page reflects current practitioner and academic references on backtesting and common pitfalls, alongside hands-on work.