Fairness & Bias in Machine Learning

A machine-learning model learns patterns from data — and if that data reflects an unfair world, the model learns the unfairness and reproduces it, faster, cheaper, and wrapped in a veneer of mathematical objectivity that makes it harder to challenge. Algorithmic bias is not a hypothetical: risk-assessment tools, hiring filters, and lending models have all been shown to treat groups of people systematically differently. Fairness in ML is the technical discipline of detecting and reducing that — and it's genuinely hard, for a reason that surprises most people: "fair" has several precise definitions, and you can't satisfy them all at once.

This sits next to the data governance page — that one is the policy and ethics; this is the machinery: how bias gets in, how to measure fairness, the mathematical impossibility at the core, and where you can intervene. It matters anywhere a model-assisted decision affects people, which in a government setting is much of the point.

Bias with consequences

The word "bias" here doesn't mean the bias-variance kind from the modelling page — it means systematic unfairness toward a group of people, usually one defined by a protected attribute (race, sex, age, disability). The danger is specific: a model applies its learned bias consistently and at scale, to everyone, instantly, while looking neutral. A biased human decision-maker affects the people they meet; a biased model can affect millions and is much harder to argue with, because "the algorithm said so" carries false authority.

How bias gets in

Bias rarely comes from a malicious modeller. It seeps in through the data and the framing, mostly invisibly:

Historical bias — the data faithfully records a world that was already unequal. A hiring model trained on who got hired before learns the past's prejudices as if they were merit.
Representation bias — some groups are under-sampled, so the model works worse for them (the coverage problem with human stakes).
Measurement bias — the label itself is a flawed proxy. "Re-arrested" is not the same as "committed a crime", but a model trained on arrests learns policing patterns, not crime.

The throughline: the model is an accurate mirror of biased data. Garbage in, bias out — and the model then amplifies and entrenches it.

The proxy trap: you can't just delete the variable

The intuitive first fix — "just don't give the model race or sex" — does not work, and understanding why is the single most important idea on this page. The protected attribute is almost always encoded redundantly in the other features through proxies.

The proxy trap. Removing the protected attribute (race) doesn't remove its influence — postcode, name, school, and shopping patterns all correlate with it, so the model reconstructs the protected attribute from its proxies and the bias flows through anyway.

Postcode correlates with race; first name signals gender; the school you attended, your shopping patterns, your phrasing — any of them can let a model reconstruct the protected attribute it was never given, and discriminate through the back door. This is why fairness can't be achieved by blindness; you have to actively measure outcomes across groups and intervene, which means the analysis is anything but simple.

Defining 'fair': the metrics

To fix fairness you must first define it — and there are several reasonable, mutually competing definitions. The main group-fairness criteria:

Demographic parity — each group gets positive outcomes at the same rate (equal approval rates across groups), regardless of anything else.
Equal opportunity — among those who genuinely should get the positive outcome, each group is caught at the same rate (equal true-positive rates).
Equalised odds — stricter: equal true-positive and false-positive rates across groups.

Each encodes a different, defensible notion of fairness — and that's exactly where the trouble starts, because they can pull against each other.

The impossibility result

Here is the deep, sobering fact at the heart of the field: when groups have different base rates, you cannot satisfy all the fairness criteria simultaneously. It's a mathematical impossibility (formalised by Chouldechova and by Kleinberg and colleagues), not an engineering gap — calibration, equal false-positive rates, and equal false-negative rates can't all hold at once unless the base rates are identical or the model is perfect.

Where to intervene

Once you've chosen a fairness definition and measured the disparity, mitigation can act at three stages of the pipeline:

Pre-processing — fix the data before training: reweight under-represented groups, re-sample, or transform features to reduce the disparity at the source.
In-processing — build fairness into the training itself, adding a fairness constraint or penalty to the objective so the model optimises accuracy and fairness together (e.g. adversarial debiasing).
Post-processing — adjust the model's outputs after the fact, e.g. using group-specific thresholds to equalise the chosen metric.

None is a silver bullet, and every one trades some accuracy or one fairness notion for another — which is why fairness work is inseparable from explanation (you have to see what the model is doing) and from a documented, defensible decision about which trade-off you accepted and why.

Where it shows up in my work

Equitable, and able to prove it

Any model that informs a decision about people carries this responsibility, and in government it's acute: a model-assisted call that's systematically worse for one group isn't just a technical flaw, it's a fairness and accountability failure. The most valuable thing this gives me is knowing the proxy trap — that dropping a sensitive attribute doesn't make a model fair, because it reconstructs it from postcode and the rest — so fairness has to be measured across groups, not assumed.

And the impossibility result reframes the whole conversation honestly: there's no objectively "fair" model, so the real work is choosing which fairness criterion fits the context, naming the trade-off out loud, and being able to defend it — exactly the kind of value judgement that shouldn't be hidden inside an algorithm. It ties straight to explainability (you can't audit fairness you can't see), feature engineering (where proxies live), and governance (the policy around it).

Refresh in 60 seconds

A model trained on a biased world learns and amplifies the bias — at scale, with false authority. "Bias" here = unfairness to a protected group, not bias-variance.
Bias enters via historical, representation, and measurement bias — the data mirrors an unequal world.
The proxy trap: deleting race/sex doesn't help — postcode, name, etc. reconstruct it. Fairness needs measurement across groups, not blindness.
Fairness metrics: demographic parity (equal rates), equal opportunity (equal TPR), equalised odds (equal TPR + FPR) — and they compete.
Impossibility result: with different base rates you can't satisfy all at once (COMPAS — both sides were right). Fairness is a value choice, not an optimisation.
Mitigate at pre- / in- / post-processing — each trades off accuracy or another fairness notion. Document the choice.

The proxy/redundant-encoding trap, the group-fairness metrics, the impossibility result, and the COMPAS case reflect current fairness-in-ML references alongside hands-on work.