Skip to content
Knowledge

/knowledge/survival-analysis

Survival Analysis

Not 'will it happen?' but 'how long until it does?' — and the twist that makes it its own field: at the end of your study, some of it hasn't happened yet. Throwing those cases away biases everything; survival analysis keeps them.

Studied
Survival AnalysisAdvanced · time-to-event
When
Statistics coursework
Applied in
Duration & timing questions
Read / Refreshed
~15 min read2026-06-26

Many of the most important questions aren't "will this happen?" but "how long until it happens?" — how long until a patient relapses, a customer churns, a machine fails, a case is resolved, a person re-offends. Survival analysis (or time-to-event analysis) is the branch of statistics built for exactly these questions, and it exists as its own field because of one peculiar, unavoidable feature of the data: when your study ends, the event hasn't happened to everyone yet — and what you do with those unfinished cases changes everything.

It's a genuinely distinct tool worth knowing, because the obvious approaches all quietly fail on time-to-event data. This page builds it up: why ordinary methods break, the central idea of censoring, the two functions that describe survival, and the two workhorse methods — Kaplan-Meier and the Cox model.

01

How long until…?

At first glance you might reach for tools you already have, and each fails in an instructive way. Treat it as a regression on "time until event"? You can't — for many subjects the event hasn't happened, so their time is unknown. Treat it as classification("did the event happen by time T?")? You throw away the rich information of when, and the answer depends arbitrarily on where you draw T.

The data has a special structure — a duration, plus whether the event has actually occurred yet — that needs purpose-built methods. The crux of all of them is how they handle the cases that haven't finished.

02

Censoring: the idea that defines the field

Censoring is the heart of survival analysis. A subject is right-censored when you know they survived up to a certain point, but not what happened after — because the study ended, or they dropped out, before the event occurred. You don't know their true event time; you only know it's longer than what you observed.

study ends● event ▸ censored
Censoring. Some subjects experience the event during the study (dot). Others are still event-free when observation ends — right-censored (arrow). Their time isn't missing or zero: it's a real lower bound (the event would happen later), and survival methods use exactly that partial information.

Censored cases carry real information — "survived at least this long" — and the cardinal sin is to mishandle them. Drop them and you bias the result (you'd systematically lose the longest survivors); treat them as if the event happened at the censoring time and you bias it the other way. The whole machinery below exists to use that partial information correctly.

03

Two ways to describe survival

Survival is described by two complementary functions. The survival function:

S(t)=Pr(T>t)S(t) = \Pr(T > t)

— the probability of surviving (not having the event) beyond time tt. It starts at 1 and steps down toward 0. The hazard function h(t)h(t) takes a different angle: it's the instantaneous rate of the event at time tt, given you've survived that far — the risk right now for those still at risk. Survival answers "what fraction last this long?"; hazard answers "for a survivor, how dangerous is this moment?" They're two views of the same process, and different methods model one or the other.

04

Kaplan-Meier: estimating the curve

The Kaplan-Meier estimator is the workhorse for estimating S(t)S(t) from data, and it handles censoring elegantly. It produces the familiar step curve: survival stays flat, then drops a step at each time an event actually occurs, with the size of each drop set by how many were still at risk just before. Censored subjects don't cause a drop — they simply leave the "at risk" pool at their censoring time, so they correctly contribute to the denominator up to that point and no further.

That's the clever bit: by only stepping down at observed events and adjusting the at-risk count as censored cases exit, Kaplan-Meier extracts an unbiased survival curve from data that's riddled with unfinished cases. The result is the single most recognisable picture in the field — and the standard way to show "what fraction are still event-free over time".

05

Comparing groups: the log-rank test

Often the real question is comparative: does group A survive longer than group B (treatment vs control, one cohort vs another)? You plot a Kaplan-Meier curve for each and compare them with the log-rank test — a hypothesis test for whether two (or more) survival curves differ more than chance would explain. It's the survival-analysis counterpart to comparing group means, built to respect censoring. It tells you whether the curves differ, but not by how much, or while adjusting for other factors — which is where the Cox model comes in.

06

The Cox proportional hazards model

To ask "how does each factor affect survival, holding the others constant?" you need a regression — and the Cox proportional hazards model is the dominant one. Rather than model the survival curve directly, it models the hazard, because hazards are more stable and tractable. Its form:

h(tx)=h0(t)exp(β1x1++βpxp)h(t \mid \mathbf{x}) = h_0(t)\, \exp(\beta_1 x_1 + \cdots + \beta_p x_p)

The beauty is that it's semi-parametric: the baseline hazard h0(t)h_0(t) — how risk changes over time in general — is left unspecified, so you make no assumption about the shape of the survival curve. You only estimate the β\beta coefficients, the effect of each covariate. Exponentiating a coefficient gives a hazard ratio: eβ=2e^{\beta} = 2 means that factor doubles the instantaneous risk at any time; below 1 it's protective. That single, interpretable number — "this factor multiplies the risk by X" — is why the Cox model is everywhere in medicine, reliability, and social science.

07

The proportional-hazards assumption

The Cox model buys its flexibility with one key assumption, hidden in the name: proportional hazards. It assumes a covariate's effect is a constant multiplier on the hazard at all times — the hazard ratio between two groups doesn't change as time passes.

08

Where it shows up in my work

09

Refresh in 60 seconds

The censoring framing, Kaplan-Meier/log-rank pairing, and the Cox model with its proportional-hazards caveat reflect current survival-analysis references alongside statistics coursework.