Network & Graph Analysis

Almost all of analysis treats records as independent — one row per person, per case, per transaction. But some of the most valuable information isn't in the rows; it's in the connections between them. Who calls whom, which accounts move money to which, which people appear together. Network analysis takes those relationships as the primary object of study, and a surprising amount of insight that's invisible row-by-row becomes obvious once you look at the structure.

It's a discipline I lean on directly in intelligence work, where the question is often "who is the key connector here, and what's the hidden group?" — a question you literally cannot answer without modelling the links. This page is the practical toolkit: how to represent a network, the handful of measures that find the important nodes, how to find communities, and the traps that make network analysis lie.

Relationships as data

A graph (or network) is just two things: a set of nodes (the entities — people, accounts, places) and a set of edges (the relationships between them). That's it — and yet representing data this way unlocks questions that tabular data can't express: how far apart are two people through intermediaries? Who sits at the centre? Which tightly-knit group does someone belong to?

The shift in mindset is the whole point: stop asking "what are this node's attributes?" and start asking "what is this node's position in the structure?" Two people with identical profiles can play utterly different roles depending on who they're connected to — and the role is often what matters.

Nodes, edges & structure

Edges carry meaning, and the kind of edge changes the analysis:

Directed vs undirected — "called" has a direction (A → B); "appeared together" doesn't. Direction matters for who influences whom.
Weighted vs unweighted — an edge can carry a strength (number of calls, dollars moved), not just existence.
Paths & connectivity — a path is a route along edges between two nodes; the shortest path (degrees of separation) underpins several of the measures below. A network can also break into disconnected components.

With that vocabulary, the central practical question becomes: which nodes are important, and why? "Important" has several distinct meanings, and the art is picking the one that matches your question — that's centrality.

Who's busy: degree centrality

The simplest measure is degree centrality — just count a node's connections. The degree $k_i$ of node $i$ is the number of edges touching it. High degree means a hub: someone connected to many others.

It's a fine first pass — and often misleading on its own. A node can have a hundred connections but sit on the edge of the network, while a node with three connections sits at its only bridge. Degree counts quantity, not position. The next two measures fix that.

Who's the broker: betweenness centrality

Betweenness centrality measures how often a node lies on the shortest path between others. Formally, it sums, over all pairs of other nodes, the fraction of shortest paths that pass through node $v$ :

C_B(v) = \sum_{s \neq v \neq t} \frac{\sigma_{st}(v)}{\sigma_{st}}

where $\sigma_{st}$ is the number of shortest paths from $s$ to $t$ , and $\sigma_{st}(v)$ the number of those passing through $v$ . A high-betweenness node is a broker or bridge: information, money, or influence has to flow through it to get between parts of the network. These are often the most consequential nodes of all — remove one and the network can fracture — even when their raw degree is modest.

Betweenness finds the broker. The highlighted node has only modest degree, but every path between the left cluster and the right cluster runs through it — a cut-point whose removal splits the network. Degree alone would miss it.

Who's influential: eigenvector & PageRank

Eigenvector centrality captures a subtler idea: it's not just how many connections you have, but how important they are. You're influential if you're connected to influential people. That's circular by design, and the maths resolves the circularity elegantly — a node's score is proportional to the sum of its neighbours' scores:

x_i = \frac{1}{\lambda} \sum_{j \in N(i)} x_j

Written for the whole network this is $A\mathbf{x} = \lambda \mathbf{x}$ — the scores are an eigenvector of the network's adjacency matrix (hence the name, and a neat callback to the linear-algebra page). PageRank — the algorithm that originally ranked the web — is a famous variant: a page is important if important pages link to it. The same logic finds the quietly powerful node that brokers don't capture: not the busiest, not the bridge, but the one embedded among the influential.

Finding the groups: community detection

Beyond individual nodes, networks have community structure — clusters of nodes more densely connected to each other than to the rest. Finding them reveals the cells, factions, or interest groups inside a network, and it's the graph cousin of clustering.

The standard objective is modularity: a partition of the network scores high when there are many edges within groups and few between them, compared to what you'd expect by chance. Algorithms like Louvain and Leiden optimise it efficiently, and label-propagation methods offer a fast alternative. The output — "these twenty nodes form a tight group barely connected to the rest" — is often the single most operationally useful thing network analysis produces.

Where it misleads

Networks are seductive and easy to over-read. The traps:

Centrality isn't importance. A high score is a structural fact, not a verdict. The most central node might be a switchboard operator, not a kingpin. Centrality nominates nodes for attention; it doesn't convict them.
Missing-edge bias. Your network is only the connections you happened to record. Missing edges (an unobserved relationship) can completely change who looks central, and absence of an edge is rarely evidence of absence of a link.
The hairball. A big network drawn as a tangle of crossing lines looks impressive and shows nothing. Lean on the measures (centrality tables, community labels), not the pretty-but-unreadable picture.
Spurious nodes. A shared taxi rank or a customer-service line can make unrelated people look connected. Garbage edges produce confident-looking, wrong structure.

Where it shows up in my work

Finding the connector & the hidden group

In intelligence work, link analysis is often the core task: take people, accounts, and transactions, model them as a graph, and ask who is the key connector and what is the hidden community. That's exactly degree vs betweenness (the broker whose removal fractures the network) vs eigenvector/PageRank (the quietly well-connected), plus community detection to surface the cell that isn't obvious case-by-case.

And the pitfalls are exactly where the discipline pays off: treating centrality as a lead, not a conclusion, remembering that a missing edge can move the whole picture, and refusing to be impressed by a hairball. It connects to the rest of the section too — communities are clustering on a graph, eigenvector centrality is linear algebra, and the leads it produces feed straight into structured analysis.

Refresh in 60 seconds

Centrality definitions, modern community-detection methods, and the centrality-isn't-importance caution reflect current network-analysis references alongside hands-on link analysis.