Anomaly Detection

Most analysis is about the typical case — the average, the trend, the pattern. Anomaly detection is about the opposite: finding the rare points that don't fit, because in a great many domains the unusual case is the one that matters. A fraudulent transaction, a security breach, a sensor about to fail, a record that warrants a closer look — they're all needles in a haystack, and the haystack is enormous.

It's a discipline I lean on directly in intelligence and integrity work, where the whole job is often to surface the few cases worth investigating from a sea of normal ones. This page is the practical landscape: what an anomaly is, the methods, and the trade-off that quietly decides whether a detector is actually useful.

Finding the unusual

An anomaly (or outlier) is a data point that deviates so much from the rest that it likely came from a different process. The premise is that "different" often means "interesting" — the deviation is a signal of fraud, error, failure, or threat, not just noise. The goal isn't to model the anomalies (you usually can't — they're rare and varied); it's to model what normal looks like well enough that the abnormal stands out.

That framing is the key to the whole field. You learn the shape of "normal" from the bulk of the data, then flag whatever falls far outside it. Everything below is a different way of defining "far outside".

Three kinds of anomaly

Anomalies come in three flavours, and the distinction changes the method:

Point anomalies — a single value that's extreme on its own (a $1,000,000 transaction on a normal account). The simplest case.
Contextual anomalies — a value that's only odd in context. 30°C is normal in summer, anomalous in winter; the number is fine, the context isn't. Time and place matter.
Collective anomalies — a group of points that's abnormal together even though each is individually fine (a sudden burst of small transactions, a coordinated pattern of logins).

Knowing which you're hunting for matters: a method that catches point anomalies will sail straight past a contextual one. Context and sequence (the time-series view) often have to be built in deliberately.

Why it's hard

Three properties make anomaly detection genuinely difficult:

They're rare — by definition. Extreme class imbalance means accuracy is useless (a detector that flags nothing is 99.9% accurate and 100% worthless), the same base-rate trap from the probability page.
They're usually unlabelled — you rarely have a clean set of known anomalies to learn from, so most of the work is unsupervised: define normal, flag deviations.
They evolve — fraudsters change tactics, systems drift, so today's normal isn't tomorrow's. A static detector decays.

Statistical methods

The simplest detectors are statistical: assume a distribution for "normal" and flag what's improbable under it. For roughly bell-shaped data, the z-score flags points more than a few standard deviations from the mean; for skewed data, the IQR rule (points beyond 1.5× the interquartile range) is more robust. These are fast, transparent, and a fine first pass.

Distance and density

A more general idea: an anomaly is a point that sits far from its neighbours, in a low-density region. This connects directly to clustering — anomalies are the points that don't belong to any dense group. Two well-used methods:

Local Outlier Factor (LOF) — compares a point's local density to its neighbours'. It's clever because it's local: it can flag a point that's in a sparse region even if globally it isn't the most extreme, catching outliers that sit between clusters.
DBSCAN — the density clustering method that labels low-density points as noise; those noise points are your anomalies, found for free.

The trade-off: distance-based methods struggle in very high dimensions (the curse of dimensionality again — everything is far from everything), so reducing dimensions first often helps.

Model-based detection

The most popular modern approaches learn a model of normal and score deviation from it:

Isolation Forest — the clever, widely-used default. Instead of modelling density, it randomly splits the data and notes that anomalies are easy to isolate: a weird point gets cut off from the rest in just a few random splits, while normal points take many. The shorter the path to isolate a point, the more anomalous it is. Fast, scales well, and needs little tuning.
Autoencoders — a neural network trained to compress and reconstruct normal data. Show it an anomaly and it reconstructs it badly (it never learned that shape), so a high reconstruction error flags the outlier. Powerful for complex, high-dimensional data like images or sequences.

Isolation Forest's intuition. A normal point sits deep inside the crowd and takes many random cuts to isolate; an anomaly sits alone and is separated in just a few. Fewer cuts to isolate ⇒ more anomalous.

The alert-fatigue trade-off

Here's the trade-off that decides whether a detector is actually useful, and it's the precision/recall tension from the statistics page in its most consequential form. Most detectors have a sensitivity dial (the "contamination" or threshold):

Turn it up → catch more real anomalies (high recall) but drown in false alarms (low precision).
Turn it down → fewer false alarms (high precision) but miss real ones (low recall).

Judging without labels

Evaluation is hard precisely because you usually lack labels — if you knew the anomalies, you wouldn't need to detect them. In practice you validate on whatever labelled subset you have (past confirmed cases), use the precision/recall/F1 family rather than accuracy, and lean on domain experts to confirm a sample of what's flagged. Above all, you tune to the real cost: in most settings a missed anomaly and a false alarm have very different prices, and the threshold should reflect that, not a default.

Where it shows up in my work

Refresh in 60 seconds

Anomaly detection finds rare points that don't fit — and "different" is often "important". Model normal, flag deviations.
Three kinds: point, contextual (odd for the context), collective (odd as a group). Hard because anomalies are rare, unlabelled, and evolving (base-rate trap).
Methods: statistical (z-score/IQR, but univariate), distance/density (LOF, DBSCAN), model-based (isolation forest — fewer cuts to isolate; autoencoders — high reconstruction error).
The key trade-off is precision vs recall via a sensitivity dial — and alert fatigue: too many false positives gets the system ignored.
Combine model scores with domain rules; tune the threshold to what humans can triage and to the real cost of a miss vs a false alarm.
Evaluate with precision/recall/F1 on whatever labels you have, plus expert review — never accuracy. Standardise & reduce dimensions first.

Method comparisons and the alert-fatigue framing reflect current anomaly-detection references (isolation forest / LOF practice, alert-fatigue research) alongside hands-on work.