Computational Statistics

Classical statistics gives beautiful closed-form answers — but only for the problems neat enough to have them. The standard error of a mean, the formula for a confidence interval: these exist because someone could do the algebra. The moment your statistic is unusual, your model is complex, or your assumptions don't hold, the algebra runs out. Computational statistics is the answer: when you can't derive the result, compute it — by simulation, resampling, and iteration.

It's the bridge from the statistics page's theory to what actually runs on real, messy data, and it's the machinery under Bayesian inference. The unifying idea is simple and a little subversive: trade mathematical elegance for brute-force computing, and let the computer find the answer the maths couldn't.

When the maths runs out

The classical formulas of the statistics page rest on assumptions — usually that data is normally distributed and your statistic is something simple like a mean. Reality routinely breaks both: a skewed distribution, a small sample, or a quantity like a median, a correlation, or a ratio for which no tidy standard-error formula exists.

Rather than give up (or pretend the assumptions hold), computational statistics reframes the question. Instead of deriving how a statistic behaves, you simulate it — generate the relevant randomness many times and watch what happens. With enough computing, the empirical answer is as good as the analytic one would have been, and it works for problems no formula can reach.

Monte Carlo methods

The foundational technique is the Monte Carlo method: estimate a quantity you can't compute directly by drawing many random samples and averaging. Want the expected value of some function of a random variable? Don't integrate — sample it $N$ times and take the mean:

\mathbb{E}[f(X)] \approx \frac{1}{N} \sum_{i=1}^{N} f(x_i)

Draw N samples, apply f, average. By the Law of Large Numbers this converges to the true expectation as N grows.

It works because of the Law of Large Numbers — averages converge to the truth as samples accumulate — and its error shrinks at a predictable rate of $1/\sqrt{N}$ , so to halve the error you need four times the samples. From estimating π by throwing random darts at a square, to pricing financial options, to integrating in high dimensions where ordinary numerical integration collapses, Monte Carlo is the workhorse. The price is compute; the reward is answers to otherwise intractable problems.

The bootstrap

The single most useful idea in computational statistics is the bootstrap, and it sounds like cheating. You want to know how much a statistic (say, a median) would vary if you could collect many fresh samples — but you only have one sample. The bootstrap's trick: treat your sample as if it were the population, and draw new samples from it.

From your dataset of $n$ points, resample $n$ points with replacement — some originals appear twice, some not at all.
Compute your statistic on this bootstrap sample.
Repeat thousands of times. The spread of those values estimates the statistic's standard error, and their percentiles give a confidence interval.

That's it — and it works for any statistic, with no formula and no distributional assumptions. It gives you the standard errors and confidence intervals from the statistics page for quantities that have no analytic ones. Resampling your single dataset really does reveal how much your estimate would have wobbled — one of those rare ideas that feels too good to be true and simply isn't.

The bootstrap. From one original sample, draw many resamples (with replacement) and compute the statistic on each. The distribution of those values gives the standard error and a confidence interval — no formula required.

Permutation tests

The same resampling spirit gives a beautifully direct way to run a hypothesis test without any formula. A permutation test asks "is the difference between these two groups real, or could it be chance?" by simulating chance directly: if the group labels truly didn't matter (the null hypothesis), you could shuffle them and see no difference.

So you shuffle the labels thousands of times, recompute the difference each time, and build the distribution of differences you'd expect under pure chance. Where your real difference falls in that distribution is your p-value — computed, not derived, and valid without assuming normality or anything else. It's the p-value from first principles.

The EM algorithm

Some models have a chicken-and-egg structure: there's a hidden (latent) variable you'd need to know to estimate the parameters, but you'd need the parameters to figure out the latent variable. The classic case is a mixture of groups — you want each group's parameters, but you don't know which group each point belongs to.

The Expectation-Maximisation (EM) algorithm breaks the deadlock by alternating, like a two-step dance, until it stabilises:

E-step — given the current parameters, estimate the hidden variable (e.g. each point's probability of belonging to each group).
M-step — given those estimates, update the parameters to their best values.

Each round increases the likelihood, so it converges. It's the engine behind Gaussian mixture models and many clustering and missing-data methods — and it's the same alternate-and-converge pattern as k-means, which is essentially a hard version of EM.

Numerical optimisation

Most computational statistics ends in an optimisation — usually finding the maximum-likelihood parameters — and those rarely have a closed-form solution. So you solve them numerically, with the gradient-based methods from the calculus page: start somewhere, follow the slope toward the optimum, iterate. Fitting a GLM, a complex likelihood, or almost any modern model is exactly this under the hood — the computer hill-climbing to the best parameters because no formula hands them over.

MCMC, in one line

The most powerful member of the family gets its own treatment on the Bayesian page, but it belongs here too: Markov Chain Monte Carlo is computational statistics solving the hardest problem of all — drawing samples from a distribution you only know up to a constant, so you can compute with a posterior that has no closed form. It's the reason Bayesian inference became practical, and it's Monte Carlo and simulation taken to their logical end.

The catch: randomness

All of this leans on a steady supply of random numbers — and computers are deterministic, so they can't make truly random ones. They use pseudo-random generators: algorithms that produce sequences statistically indistinguishable from random but fully determined by a starting seed. That's a feature, not a bug: set the seed and your "random" simulation is exactly reproducible — the same reproducibility discipline as everywhere else, applied to randomness itself. (The other practical catch is cost: more samples means more compute, so techniques like variance reduction aim to get the same accuracy from fewer draws.)

Where it shows up in my work

Refresh in 60 seconds

When the maths has no closed form, compute the answer by simulation and resampling instead of deriving it.
Monte Carlo: estimate by sampling — $\mathbb{E}[f(X)] \approx \frac{1}{N}\sum f(x_i)$ ; error shrinks like $1/\sqrt{N}$ .
The bootstrap: resample your one dataset with replacement thousands of times → standard errors & CIs for any statistic, no assumptions.
Permutation tests: shuffle labels to simulate the null → a p-value from first principles.
EM: alternate E-step (estimate hidden variable) and M-step (update parameters) for latent-variable models. Numerical optimisation fits most models (MLE by gradient methods).
It all runs on pseudo-random numbers — set the seed for reproducibility. MCMC is this taken to its limit (Bayesian posteriors).