/knowledge/computational-statistics
Computational Statistics
What you do when the equations have no answer. Trade pen-and-paper derivations for raw computing power — and solve, by simulation and resampling, problems that classical theory can't touch.
- Studied
- Computational StatisticsMaster of Data Science
- When
- UniMelb, 2023–2024
- Applied in
- Bootstrap CIs · simulation
- Read / Refreshed
- ~15 min read2026-06-25
Classical statistics gives beautiful closed-form answers — but only for the problems neat enough to have them. The standard error of a mean, the formula for a confidence interval: these exist because someone could do the algebra. The moment your statistic is unusual, your model is complex, or your assumptions don't hold, the algebra runs out. Computational statistics is the answer: when you can't derive the result, compute it — by simulation, resampling, and iteration.
It's the bridge from the statistics page's theory to what actually runs on real, messy data, and it's the machinery under Bayesian inference. The unifying idea is simple and a little subversive: trade mathematical elegance for brute-force computing, and let the computer find the answer the maths couldn't.
01
When the maths runs out
The classical formulas of the statistics page rest on assumptions — usually that data is normally distributed and your statistic is something simple like a mean. Reality routinely breaks both: a skewed distribution, a small sample, or a quantity like a median, a correlation, or a ratio for which no tidy standard-error formula exists.
Rather than give up (or pretend the assumptions hold), computational statistics reframes the question. Instead of deriving how a statistic behaves, you simulate it — generate the relevant randomness many times and watch what happens. With enough computing, the empirical answer is as good as the analytic one would have been, and it works for problems no formula can reach.
02
Monte Carlo methods
The foundational technique is the Monte Carlo method: estimate a quantity you can't compute directly by drawing many random samples and averaging. Want the expected value of some function of a random variable? Don't integrate — sample it times and take the mean:
Draw N samples, apply f, average. By the Law of Large Numbers this converges to the true expectation as N grows.
It works because of the Law of Large Numbers — averages converge to the truth as samples accumulate — and its error shrinks at a predictable rate of , so to halve the error you need four times the samples. From estimating π by throwing random darts at a square, to pricing financial options, to integrating in high dimensions where ordinary numerical integration collapses, Monte Carlo is the workhorse. The price is compute; the reward is answers to otherwise intractable problems.
03
The bootstrap
The single most useful idea in computational statistics is the bootstrap, and it sounds like cheating. You want to know how much a statistic (say, a median) would vary if you could collect many fresh samples — but you only have one sample. The bootstrap's trick: treat your sample as if it were the population, and draw new samples from it.
- From your dataset of points, resample points with replacement — some originals appear twice, some not at all.
- Compute your statistic on this bootstrap sample.
- Repeat thousands of times. The spread of those values estimates the statistic's standard error, and their percentiles give a confidence interval.
That's it — and it works for any statistic, with no formula and no distributional assumptions. It gives you the standard errors and confidence intervals from the statistics page for quantities that have no analytic ones. Resampling your single dataset really does reveal how much your estimate would have wobbled — one of those rare ideas that feels too good to be true and simply isn't.
04
Permutation tests
The same resampling spirit gives a beautifully direct way to run a hypothesis test without any formula. A permutation test asks "is the difference between these two groups real, or could it be chance?" by simulating chance directly: if the group labels truly didn't matter (the null hypothesis), you could shuffle them and see no difference.
So you shuffle the labels thousands of times, recompute the difference each time, and build the distribution of differences you'd expect under pure chance. Where your real difference falls in that distribution is your p-value — computed, not derived, and valid without assuming normality or anything else. It's the p-value from first principles.
05
The EM algorithm
Some models have a chicken-and-egg structure: there's a hidden (latent) variable you'd need to know to estimate the parameters, but you'd need the parameters to figure out the latent variable. The classic case is a mixture of groups — you want each group's parameters, but you don't know which group each point belongs to.
The Expectation-Maximisation (EM) algorithm breaks the deadlock by alternating, like a two-step dance, until it stabilises:
- E-step — given the current parameters, estimate the hidden variable (e.g. each point's probability of belonging to each group).
- M-step — given those estimates, update the parameters to their best values.
Each round increases the likelihood, so it converges. It's the engine behind Gaussian mixture models and many clustering and missing-data methods — and it's the same alternate-and-converge pattern as k-means, which is essentially a hard version of EM.
06
Numerical optimisation
Most computational statistics ends in an optimisation — usually finding the maximum-likelihood parameters — and those rarely have a closed-form solution. So you solve them numerically, with the gradient-based methods from the calculus page: start somewhere, follow the slope toward the optimum, iterate. Fitting a GLM, a complex likelihood, or almost any modern model is exactly this under the hood — the computer hill-climbing to the best parameters because no formula hands them over.
07
MCMC, in one line
The most powerful member of the family gets its own treatment on the Bayesian page, but it belongs here too: Markov Chain Monte Carlo is computational statistics solving the hardest problem of all — drawing samples from a distribution you only know up to a constant, so you can compute with a posterior that has no closed form. It's the reason Bayesian inference became practical, and it's Monte Carlo and simulation taken to their logical end.
08
The catch: randomness
All of this leans on a steady supply of random numbers — and computers are deterministic, so they can't make truly random ones. They use pseudo-random generators: algorithms that produce sequences statistically indistinguishable from random but fully determined by a starting seed. That's a feature, not a bug: set the seed and your "random" simulation is exactly reproducible — the same reproducibility discipline as everywhere else, applied to randomness itself. (The other practical catch is cost: more samples means more compute, so techniques like variance reduction aim to get the same accuracy from fewer draws.)
09
Where it shows up in my work
10