Skip to content
Knowledge

/knowledge/fairness-bias

Fairness & Bias in Machine Learning

A model trained on a biased world learns the bias and applies it at scale, with the authority of maths. Making a model fair is harder than it sounds — partly because 'fair' has several definitions that can't all be true at once.

Studied
Fairness & Bias in Machine LearningIn practice · equitable decisions
When
Gov analysis · ongoing
Applied in
Equitable model-assisted calls
Read / Refreshed
~15 min read2026-06-26

A machine-learning model learns patterns from data — and if that data reflects an unfair world, the model learns the unfairness and reproduces it, faster, cheaper, and wrapped in a veneer of mathematical objectivity that makes it harder to challenge. Algorithmic bias is not a hypothetical: risk-assessment tools, hiring filters, and lending models have all been shown to treat groups of people systematically differently. Fairness in ML is the technical discipline of detecting and reducing that — and it's genuinely hard, for a reason that surprises most people: "fair" has several precise definitions, and you can't satisfy them all at once.

This sits next to the data governance page — that one is the policy and ethics; this is the machinery: how bias gets in, how to measure fairness, the mathematical impossibility at the core, and where you can intervene. It matters anywhere a model-assisted decision affects people, which in a government setting is much of the point.

01

Bias with consequences

The word "bias" here doesn't mean the bias-variance kind from the modelling page — it means systematic unfairness toward a group of people, usually one defined by a protected attribute (race, sex, age, disability). The danger is specific: a model applies its learned bias consistently and at scale, to everyone, instantly, while looking neutral. A biased human decision-maker affects the people they meet; a biased model can affect millions and is much harder to argue with, because "the algorithm said so" carries false authority.

02

How bias gets in

Bias rarely comes from a malicious modeller. It seeps in through the data and the framing, mostly invisibly:

  • Historical bias — the data faithfully records a world that was already unequal. A hiring model trained on who got hired before learns the past's prejudices as if they were merit.
  • Representation bias — some groups are under-sampled, so the model works worse for them (the coverage problem with human stakes).
  • Measurement bias — the label itself is a flawed proxy. "Re-arrested" is not the same as "committed a crime", but a model trained on arrests learns policing patterns, not crime.

The throughline: the model is an accurate mirror of biased data. Garbage in, bias out — and the model then amplifies and entrenches it.

03

The proxy trap: you can't just delete the variable

The intuitive first fix — "just don't give the model race or sex" — does not work, and understanding why is the single most important idea on this page. The protected attribute is almost always encoded redundantly in the other features through proxies.

race (removed) ✗postcodenameschoolspendingmodelbias flows through anyway
The proxy trap. Removing the protected attribute (race) doesn't remove its influence — postcode, name, school, and shopping patterns all correlate with it, so the model reconstructs the protected attribute from its proxies and the bias flows through anyway.

Postcode correlates with race; first name signals gender; the school you attended, your shopping patterns, your phrasing — any of them can let a model reconstruct the protected attribute it was never given, and discriminate through the back door. This is why fairness can't be achieved by blindness; you have to actively measure outcomes across groups and intervene, which means the analysis is anything but simple.

04

Defining 'fair': the metrics

To fix fairness you must first define it — and there are several reasonable, mutually competing definitions. The main group-fairness criteria:

  • Demographic parity — each group gets positive outcomes at the same rate (equal approval rates across groups), regardless of anything else.
  • Equal opportunity — among those who genuinely should get the positive outcome, each group is caught at the same rate (equal true-positive rates).
  • Equalised odds — stricter: equal true-positive and false-positive rates across groups.

Each encodes a different, defensible notion of fairness — and that's exactly where the trouble starts, because they can pull against each other.

05

The impossibility result

Here is the deep, sobering fact at the heart of the field: when groups have different base rates, you cannot satisfy all the fairness criteria simultaneously. It's a mathematical impossibility (formalised by Chouldechova and by Kleinberg and colleagues), not an engineering gap — calibration, equal false-positive rates, and equal false-negative rates can't all hold at once unless the base rates are identical or the model is perfect.

06

Where to intervene

Once you've chosen a fairness definition and measured the disparity, mitigation can act at three stages of the pipeline:

  • Pre-processing — fix the data before training: reweight under-represented groups, re-sample, or transform features to reduce the disparity at the source.
  • In-processing — build fairness into the training itself, adding a fairness constraint or penalty to the objective so the model optimises accuracy and fairness together (e.g. adversarial debiasing).
  • Post-processing — adjust the model's outputs after the fact, e.g. using group-specific thresholds to equalise the chosen metric.

None is a silver bullet, and every one trades some accuracy or one fairness notion for another — which is why fairness work is inseparable from explanation (you have to see what the model is doing) and from a documented, defensible decision about which trade-off you accepted and why.

07

Where it shows up in my work

08

Refresh in 60 seconds

The proxy/redundant-encoding trap, the group-fairness metrics, the impossibility result, and the COMPAS case reflect current fairness-in-ML references alongside hands-on work.