Knowledge Graphs

Most data lives in tables — neat rows and columns, one record per thing. But a great deal of what we actually want to know is about relationships: who works for whom, which company owns which, how this account connects to that person. Tables handle that awkwardly (endless joins, and some questions you just can't phrase). A knowledge graph stores information the way it naturally connects — as a web of entities linked by relationships — turning "how does everything relate?" from a painful query into a natural one.

It's the structured, factual cousin of the network analysis page (that one studies the shape of a network; this one is about storing and querying meaning), and it's increasingly the backbone for grounding AI systems in real facts. This page is the practical idea: how facts become a graph, how it's built and queried, and where it pays off — and costs.

Facts as a graph

The shift in representation is the whole point. In a table, "Acme Corp is headquartered in Adelaide" is a cell in a row; the connection to "Adelaide is in South Australia" lives in a different table, and tying them together means a join. In a knowledge graph, Acme, Adelaide, and South Australia are all nodes, directly linked by labelled edges — and following the chain from a company to its state is just walking two edges. The relationships are first-class data, not something you reconstruct on demand.

Triples: the atom of knowledge

The fundamental unit is the triple: a single fact expressed as subject → predicate → object. "Acme → headquartered_in → Adelaide." "Adelaide → located_in → South Australia." "Jane → works_for → Acme." Each triple is one edge between two nodes, and a whole knowledge graph is just an enormous pile of these triples — millions or billions of them — woven into a connected web.

Knowledge as triples. Each fact is a subject→predicate→object link. Stack enough of them and the entities interconnect into a graph you can traverse — Jane works for Acme, which is in Adelaide, which is in South Australia — a chain a table would need several joins to follow.

This is how Wikidata stores the structured knowledge behind Wikipedia, and how Google's Knowledge Graph powers the fact boxes beside its search results. The triple is simple, but at scale it's astonishingly expressive.

Ontologies: the agreed vocabulary

A pile of triples is only coherent if everyone agrees what the entity and relationship types mean. That agreed vocabulary is the ontology (or schema): it defines the classes of thing (Person, Organisation, Place) and the valid relationships between them (a Person can work_for an Organisation; an Organisation can be headquartered_in a Place). The neat way to put it: data + an ontology = a knowledge graph — the ontology is what tells the machine what each node and edge actually is, lifting the graph from a tangle of strings to a structure with meaning.

The ontology is also what lets you unify heterogeneous sources: map two different databases onto the same shared vocabulary and their facts merge into one connected graph, even if they originally called the same thing by different names.

Building the graph: the hard part

Constructing a knowledge graph from real, messy sources is where the genuine difficulty lives, and it leans on tools from across this section:

Entity & relation extraction — pulling structured triples out of unstructured text (a report saying "Jane joined Acme" → the works_for triple). This is an NLP task, increasingly done with LLMs.
Entity resolution — the crux. "J. Smith", "Jane Smith", and "Smith, J." may be one person or three, and the graph is only as good as its ability to decide. Getting this wrong fragments one entity into many or conflates distinct ones — the same record-linkage problem as the data-preparation page, with higher stakes because errors corrupt the whole connected structure.

Get extraction and resolution right and the graph is powerful; get them wrong and it's confidently misleading. The construction cost — and keeping it accurate — is the central practical challenge.

Querying: the questions tables can't answer

The pay-off is the queries. Graph query languages (SPARQL for RDF graphs, Cypher for property graphs) let you ask multi-hop questions that traverse relationships — precisely the questions a table struggles with. "Which suppliers of companies that Jane has worked for are based overseas?" is a natural graph traversal (Jane → companies → their suppliers → filter by location) but a nightmare of joins in SQL.

This is the real reason to reach for a knowledge graph: when the connections are the question. Finding chains, paths, and indirect links across many relationships is what the structure is built for — and it's where the network-analysis measures (centrality, communities, link prediction) can then be applied on top.

Knowledge graphs + LLMs: GraphRAG

A knowledge graph's newest role is grounding large language models. Where ordinary RAG retrieves text passages, GraphRAG retrieves structured facts and their connections from a knowledge graph and feeds them to the LLM. Because the facts are explicit, typed, and traceable, this can sharply reduce hallucination and answer multi-hop questions an LLM would otherwise fudge — the graph supplies verifiable structure, the LLM supplies fluent language over it. It's a fast-emerging pattern precisely because it pairs each system's strength against the other's weakness.

The honest costs

Knowledge graphs are powerful but far from free:

Where it shows up in my work

A queryable picture from scattered facts

In intelligence work the core task is often exactly what a knowledge graph is built for: linking entities — people, organisations, accounts, events — across many scattered sources into one connected, queryable picture, then asking how they relate. The multi-hop questions ("who connects to this person through which organisations?") are the ones that matter and the ones a table can't easily answer.

What I hold onto is that the value lives or dies on entity resolution — getting "is this the same person?" right is the difference between a clarifying graph and a misleading one — and that the structure feeds straight into network analysis (centrality, communities on top of the graph) and into GraphRAG for grounding LLM tools in verifiable facts. It also pairs with governance: a graph of who-relates-to-what is powerful, and so demands care about access and accuracy.

Refresh in 60 seconds

A knowledge graph stores facts as entities linked by relationships — for when the connections are the question (tables handle that badly).
The atom is the triple: subject → predicate → object. Stack billions into a connected web (Wikidata, Google's Knowledge Graph).
An ontology (schema) defines the entity/relation types — data + ontology = knowledge graph; it also unifies heterogeneous sources.
Building it: entity/relation extraction (NLP/LLM) + entity resolution (the crux — "same thing or not?"; errors corrupt everything).
Query with SPARQL/Cypher for multi-hop questions tables can't do. GraphRAG grounds LLMs in typed, traceable facts (cuts hallucination).
Costs: expensive to build/maintain, entity-resolution errors, staleness, rigid schemas. Use it when relationships are central — not for everything.

The triple/ontology foundations, entity-resolution emphasis, SPARQL querying, and the emerging GraphRAG pattern reflect current knowledge-graph references alongside hands-on entity-linking work.