Reproducibility & Analytical Pipelines

Here's a test that quietly separates good analysis from fragile analysis: if someone handed you the same raw data in a year, could you reproduce your exact result — the same number, the same chart, the same conclusion? For a surprising amount of real-world work the honest answer is "no", because the result lived in a tangle of manual steps, hand-edited spreadsheets, and a notebook run out of order that nobody could re-run today.

Reproducibility is the discipline of making sure you can — that an analysis is a repeatable, auditable process rather than a one-off act of craft. It's the least glamorous topic in this whole section and, in any setting where the work has to be defended, one of the most important. This page is the practical kit for getting there.

Could you re-run it?

It helps to separate two related ideas. Reproducibility means: same data, same code, same result — anyone can re-run your analysis and get what you got. (The stronger replicability means a fresh study reaches the same conclusion.) Reproducibility is the achievable, foundational one, and it buys you several things at once:

Trust — a result that can be re-run is one that can be checked, and a result that can't is just an assertion.
Auditability — when someone asks "how did you get this?", you can show the exact path from raw data to number.
Maintainability — when the data refreshes next quarter, you re-run rather than rebuild from memory.
Collaboration — including with your future self, who will remember none of today's undocumented decisions.

Why it's a real problem

This isn't a hypothetical worry. Across science there's a recognised reproducibility crisis — a large fraction of published findings can't be reproduced, sometimes not even by their original authors, often because the exact data and code weren't preserved in a runnable state. Studies re-running published analysis notebooks have found that many simply fail to execute top to bottom.

The usual culprits are mundane and entirely avoidable: a notebook whose cells were run out of order, a manual edit nobody recorded, a dependency that silently updated, a file path that only existed on one laptop. None is dramatic; together they make work impossible to reconstruct. The good news is that the fixes are equally mundane — a handful of habits, below, remove almost all of it.

Version control: the foundation

Version control (Git is the standard) tracks every change to your code over time: what changed, when, by whom, and why. It's the single highest-value habit in the list, because it turns "the analysis" from a mutable pile of files into a recorded history you can return to any point of.

Environments: beating 'works on my machine'

Code doesn't run in a vacuum — it depends on a specific Python or R version and specific package versions, and those change. An analysis that worked last year can break or, worse, silently produce different numbers after a library updates. "It works on my machine" is the sound of an un-reproducible analysis.

The fix is to capture the environment: pin exact dependency versions (a requirements.txt, environment.yml, or lockfile) so anyone can recreate the same setup, and for full isolation use a container (Docker) that bundles the whole computational environment. Then "same code" really does mean same code, running the same way.

Pipelines, not manual steps

The deepest shift is to stop thinking of analysis as a sequence of things you do and start thinking of it as a pipeline — a coded, automated path from raw data to final output, where every step is a script and nothing is touched by hand. Raw data in, report out, one command, no manual intervention.

The two ways to run an analysis. Manual: hand-edited steps with hidden, unrecorded decisions — fragile and unrepeatable. Pipeline: each stage is code, chained end to end, re-runnable with one command from the same raw input.

This is the idea behind Reproducible Analytical Pipelines (RAP), a movement that began in government precisely to make official statistics auditable and re-runnable. A good pipeline is also idempotent — run it twice and you get the same result, with no leftover state from last time mucking up the next run.

Seeds & determinism

Many methods use randomness — a train/test split, a k-means initialisation, a bootstrap, a neural network's starting weights. Run them twice and you get slightly different answers, which quietly breaks reproducibility. The fix is a random seed: fix the seed and the "random" sequence becomes identical every run, so results are exactly repeatable while still being statistically valid. Set it once, record it, and an entire class of "why did the number change?" mysteries disappears.

Documentation & data lineage

Finally, code that runs isn't the same as code that's understandable. The last mile is making the decisions legible:

A README — what this does, how to run it, what it expects. The first thing your future self will look for.
A data dictionary — what each field means, its units, its valid values. Ambiguity here is where misinterpretation creeps in.
Data lineage / provenance — where the data came from and every transformation applied to it. This is the analyst's chain of custody, and it ties straight to data governance: a number you can trace is a number you can defend.

Where it shows up in my work

A number you can defend

In a government setting, analysis isn't done when the number is produced — it's done when the number can be re-run, audited, and defended, sometimes long after the fact and sometimes by someone else entirely. That makes reproducibility a core professional obligation, not a nicety. The habits on this page — version control, pinned environments, a coded pipeline instead of hand-edited steps, a fixed seed, and clear lineage — are exactly what turns "I got this number once" into "I can show you precisely how this number was made."

It's the operational backbone under everything else in this section: the data preparation, the evaluation, and the governance only count if the path from raw data to conclusion is recorded and re-runnable. Trustworthy analysis and reproducible analysis are, in the end, the same thing.

Refresh in 60 seconds

The reproducibility-crisis framing, notebook cautions, environment pinning, and Reproducible Analytical Pipelines (RAP) reflect current references on reproducible research and government analysis alongside hands-on work.