/knowledge/data-governance
Data Governance, Privacy & Ethics
The rules that make data trustworthy. Not the glamorous part — but in government, health, and policing, getting this wrong is how analysis becomes a liability instead of an asset.
- Studied
- Data Governance, Privacy & EthicsIn practice · gov & research data
- When
- Gov · research · ongoing
- Applied in
- Sensitive gov data, probity
- Read / Refreshed
- ~16 min read2026-06-25
Every other page makes you better at using data. This one is about being allowed and trusted to use it. Data governance is the set of rules, roles, and processes that keep an organisation's data accurate, secure, compliant, and trustworthy. It's the least glamorous topic in this section and, for the work I do, one of the most important — because handling sensitive government, health, and policing data well isn't optional, and getting it wrong turns good analysis into a serious liability.
It threads through everything else: the data you process, the systems you store it in, the models you build. Governance is the discipline that makes all of it defensible. Here's the practical shape of it — including the FAIR principles, which are the modern backbone of good data management.
01
Trust as infrastructure
Governance answers a deceptively simple question: can we trust this data, and are we handling it responsibly? It defines who owns and is accountable for each dataset (stewardship), what the rules are for quality, access, and retention, and how those rules are enforced. Done well it's invisible; done badly it shows up as contradictory numbers, a privacy breach, or a decision no one can defend.
The mindset shift is to treat data as a managed asset with obligations attached, not a free-floating resource. In a regulated setting that's not bureaucracy for its own sake — it's what lets the analysis hold up when someone asks "where did this come from, who could see it, and should you have used it?"
02
Quality and lineage
Trust starts with data quality — is the data accurate, complete, consistent, and current? — and with lineage: the documented path of where data came from and every transformation it passed through. Lineage is what makes an analysis auditable: you can trace any number back to its source and reproduce how it was derived.
This is the reproducibility discipline from the data-processing page, raised to an organisational standard. In government work it's not a nicety: when a figure feeds a decision that affects people, "here is exactly where this came from and what we did to it" is the difference between a defensible result and an indefensible one.
03
The FAIR principles
The modern backbone of good data management is a set of four principles known by the acronym FAIR — Findable, Accessible, Interoperable, Reusable. Born in scientific research (and now adopted across government and industry), they describe what it takes for data to be genuinely useful beyond the moment and the person that created it:
- Findable — data and its metadata are easy to discover, with a persistent unique identifier and rich, searchable description. You can't use what you can't find.
- Accessible — once found, it can be retrieved through a clear, standard protocol, with authentication and authorisation where needed. Access is defined, not ad hoc.
- Interoperable — it uses shared standards, formats, and vocabularies so it can be combined with other data and read by other systems. The opposite of a locked silo.
- Reusable — it's richly documented and clearly licensed, so others (including future-you) can understand and reuse it correctly.
04
Privacy
When data is about people, privacy becomes a legal and ethical duty, not a preference. In Australia the Australian Privacy Principles set the baseline, and a few ideas do most of the work in practice:
- Data minimisation — collect and keep only what you actually need. The data you don't hold can't be breached or misused.
- Purpose limitation — use data for the purpose it was collected for, not whatever turns out to be convenient later.
- De-identification — strip the identifying fields so records can't be tied to individuals — while knowing its limits: combining "anonymous" datasets can re-identify people, so de-identification is a risk-reduction, not a guarantee.
Privacy isn't a checkbox at the end; it's a constraint you design in from the start (privacy by design) — which shapes what you collect and how you model long before any analysis.
05
Access and security
Governance decides not just whether data is correct but who can touch it. The guiding rule is least privilege: each person gets access to exactly the data their role requires, and no more. Data is classified by sensitivity (public, internal, confidential, protected), and the controls scale with the classification.
This is the "never trust by default" security mindset applied to data: encrypt it in transit and at rest, log who accessed what, and assume that the cost of a breach of sensitive records is severe. In policing and health contexts, access control isn't IT hygiene — it's a core part of the public's trust.
06
Ethics and fairness
Beyond "are we allowed?" sits the harder question: "should we, and is it fair?" Data and models can encode and amplify the biases in the world that produced them — a model trained on biased historical data will faithfully reproduce that bias, now wearing the authority of "the algorithm". In government and health, where decisions touch real lives, that's not abstract: an unfair model or a misleading analysis can do real harm to real people.
So ethics belongs in the workflow, not a postscript: ask who could be harmed, check models for disparate impact across groups, be honest about limitations (the communication discipline), and keep a human accountable for consequential decisions. Being technically correct and being responsible are not the same thing, and the second is the higher bar.
07
Making it real
Principles only matter if they're operationalised. In practice that means assigning data stewards accountable for specific domains, maintaining a data catalogue (the inventory that makes data findable and documents its lineage — the practical face of FAIR), setting retention rules so data is kept no longer than needed, and defining the policies for quality, access, and classification. Governance is the unglamorous scaffolding that lets an organisation trust its own data.
08
Governing AI
As models drive more decisions, governance is extending to AI governance: managing the risks of the models themselves, not just the data. The themes are transparency (can you explain how a decision was reached?), accountability (who is responsible when it's wrong?), fairness (is the impact equitable?), and human oversight of consequential automated decisions. It's the ethics and reproducibility threads of this whole section, pointed at the model — and it's fast becoming a formal requirement, especially in the public sector.
09
Where it shows up in my work
10
Refresh in 60 seconds
Reflects current data-governance and FAIR-principles guidance (the original FAIR paper in Scientific Data; government FAIR-readiness and Australian privacy practice) alongside hands-on government work.