Skip to content
Knowledge

/knowledge/explainable-ai

Explainable AI & Interpretability

A model that's accurate but can't say why is a problem the moment its decision affects a person. Explainability is the discipline of getting a reason out of a black box — and knowing when that reason can be trusted.

Studied
Explainable AI & InterpretabilityIn practice · defensible decisions
When
Gov analysis · ongoing
Applied in
Justifying a model's call
Read / Refreshed
~15 min read2026-06-26

The most accurate models — the boosted ensembles and deep networks — are also the most opaque. They give an answer with no reason attached. That's fine when the stakes are low, and a serious problem the moment the output affects a person's life: a loan, a benefit, an investigation, a risk score. Explainable AI (XAI) is the discipline of prising a human-understandable reason out of a black box — and, just as importantly, of knowing when that reason is real and when it's a comforting fiction.

It's a topic I care about directly, because in any accountable setting a decision you can't explain is a decision you can't defend. This page is the practical landscape: why explanation matters, the tools that produce it (feature importance, LIME, SHAP, counterfactuals), and the crucial caveat that an explanation can itself be misleading.

01

Why a reason matters

Explainability isn't a nicety bolted on at the end; it serves several concrete purposes at once:

  • Trust — people (rightly) won't act on a recommendation they don't understand.
  • Debugging — an explanation reveals when a model is right for the wrong reasons (the famous case of a classifier that detected snow rather than the animal).
  • Accountability — when a decision affects someone, they deserve a reason, and increasingly the law agrees (a "right to explanation").
  • Fairness — explanation is how you catch a model leaning on something it shouldn't, the gateway to the fairness question.

02

The accuracy-interpretability trade-off

The uncomfortable tension at the heart of the field: as a rule, the more powerful a model, the less interpretable it is. A linear regression tells you exactly how each feature moves the prediction; a 500-tree gradient boosting model is far more accurate and far more opaque. You often can't have maximum accuracy and full transparency at once.

There are two broad responses, and the right one depends on the stakes. Either use an intrinsically interpretable model from the start (accepting some accuracy cost for transparency), or use the black box and apply post-hoc explanation tools to interpret it afterward. The higher the stakes and the stronger the accountability requirement, the more the first option earns its keep.

03

Glass-box models

The simplest path to an explanation is to use a model that is the explanation. These intrinsically interpretable ("glass-box") models wear their reasoning on the surface:

  • Linear / logistic regression — each coefficient is a direct, readable statement of a feature's effect.
  • A single decision tree — a flowchart of rules you can literally follow.
  • Rule lists — "if X and Y then Z", as transparent as it gets.

There's a strong argument — made forcefully by researchers like Cynthia Rudin — that for high-stakes decisions you should prefer an inherently interpretable model and not reach for a black box plus a post-hoc explanation at all, because the explanation might not faithfully reflect what the model actually did. Sometimes the small accuracy gain of the black box isn't worth the loss of genuine transparency.

04

Global vs local explanations

When you do need to explain a black box, the first distinction is the scope of the question:

global — the whole modellocal — one predictionwhy THIS one?
Two different questions. A global explanation describes the model's overall behaviour — which features matter across all predictions. A local explanation justifies one specific prediction — why this case got this outcome. You usually need both.
  • Global — how does the model behave overall? Which features matter most across all its decisions?
  • Local — why did the model make this one prediction for this case?

The distinction matters because a person affected by a decision wants a local explanation ("why was my application declined?"), while an auditor or developer wants the global picture. Different tools serve each.

05

Feature importance — and its traps

The most common global explanation is feature importance: a ranking of which inputs the model relies on most. It's a useful first look — but it comes with sharp traps. With correlated features, importance can be split arbitrarily between them or misattributed, so a genuinely important factor looks weak (or vice versa). And importance tells you a feature matters, not which direction it pushes or for whom. Treat a raw importance ranking as a starting hypothesis, not a conclusion.

06

LIME & SHAP: explaining one prediction

The two dominant tools for local explanation of any black box:

  • LIME (Local Interpretable Model-agnostic Explanations) — to explain one prediction, it probes the model with small variations around that case and fits a simple, interpretable model (a local linear approximation) to mimic the black box just there. Intuitive, but the explanation can be unstable — re-run it and you may get a somewhat different story.
  • SHAP (SHapley Additive exPlanations) — the current standard. It borrows Shapley values from cooperative game theory to fairly divide a prediction's "credit" among the features: treating each feature as a player, it computes each one's average contribution across all possible combinations. The result is theoretically grounded and consistent, and — neatly — gives both local attributions (why this case) and, by aggregating, a global view.

Both are model-agnostic — they treat the model as a black box and explain it from the outside, so they work on anything from a random forest to a neural net. SHAP's consistency guarantees have made it the default for serious work, though it's computationally heavier.

07

Counterfactual explanations

Often the most useful explanation for a person isn't a list of feature weights but an answer to "what would have to be different?" A counterfactual explanation says: "your loan was declined; had your income been $5,000 higher, it would have been approved." It's actionable, intuitive, and sidesteps the need to expose the model's internals — you just show the nearest version of the input that flips the decision. For the human on the receiving end, that's frequently the explanation that actually helps.

08

When explanations mislead

The most important caveat in the whole field: an explanation is itself a model, and it can be wrong. Post-hoc methods are approximations of what the black box did — not the genuine article — and that gap creates real dangers:

09

Where it shows up in my work

10

Refresh in 60 seconds

The global/local distinction, SHAP-vs-LIME comparison, and the "explanations can mislead" caution (and the prefer-interpretable-models argument) reflect current XAI references alongside hands-on work.