Data Governance, Privacy & Ethics

Every other page makes you better at using data. This one is about being allowed and trusted to use it. Data governance is the set of rules, roles, and processes that keep an organisation's data accurate, secure, compliant, and trustworthy. It's the least glamorous topic in this section and, for the work I do, one of the most important — because handling sensitive government, health, and policing data well isn't optional, and getting it wrong turns good analysis into a serious liability.

It threads through everything else: the data you process, the systems you store it in, the models you build. Governance is the discipline that makes all of it defensible. Here's the practical shape of it — including the FAIR principles, which are the modern backbone of good data management.

Trust as infrastructure

Governance answers a deceptively simple question: can we trust this data, and are we handling it responsibly? It defines who owns and is accountable for each dataset (stewardship), what the rules are for quality, access, and retention, and how those rules are enforced. Done well it's invisible; done badly it shows up as contradictory numbers, a privacy breach, or a decision no one can defend.

The mindset shift is to treat data as a managed asset with obligations attached, not a free-floating resource. In a regulated setting that's not bureaucracy for its own sake — it's what lets the analysis hold up when someone asks "where did this come from, who could see it, and should you have used it?"

Quality and lineage

Trust starts with data quality — is the data accurate, complete, consistent, and current? — and with lineage: the documented path of where data came from and every transformation it passed through. Lineage is what makes an analysis auditable: you can trace any number back to its source and reproduce how it was derived.

This is the reproducibility discipline from the data-processing page, raised to an organisational standard. In government work it's not a nicety: when a figure feeds a decision that affects people, "here is exactly where this came from and what we did to it" is the difference between a defensible result and an indefensible one.

The FAIR principles

The modern backbone of good data management is a set of four principles known by the acronym FAIR — Findable, Accessible, Interoperable, Reusable. Born in scientific research (and now adopted across government and industry), they describe what it takes for data to be genuinely useful beyond the moment and the person that created it:

Findable — data and its metadata are easy to discover, with a persistent unique identifier and rich, searchable description. You can't use what you can't find.
Accessible — once found, it can be retrieved through a clear, standard protocol, with authentication and authorisation where needed. Access is defined, not ad hoc.
Interoperable — it uses shared standards, formats, and vocabularies so it can be combined with other data and read by other systems. The opposite of a locked silo.
Reusable — it's richly documented and clearly licensed, so others (including future-you) can understand and reuse it correctly.

FAIR — Findable, Accessible, Interoperable, Reusable. Rich metadata sits at the centre, because good metadata is what makes all four possible. FAIR is about being well-managed and well-described — crucially, not the same as being 'open'.

Privacy

When data is about people, privacy becomes a legal and ethical duty, not a preference. In Australia the Australian Privacy Principles set the baseline, and a few ideas do most of the work in practice:

Data minimisation — collect and keep only what you actually need. The data you don't hold can't be breached or misused.
Purpose limitation — use data for the purpose it was collected for, not whatever turns out to be convenient later.
De-identification — strip the identifying fields so records can't be tied to individuals — while knowing its limits: combining "anonymous" datasets can re-identify people, so de-identification is a risk-reduction, not a guarantee.

Privacy isn't a checkbox at the end; it's a constraint you design in from the start (privacy by design) — which shapes what you collect and how you model long before any analysis.

Access and security

Governance decides not just whether data is correct but who can touch it. The guiding rule is least privilege: each person gets access to exactly the data their role requires, and no more. Data is classified by sensitivity (public, internal, confidential, protected), and the controls scale with the classification.

This is the "never trust by default" security mindset applied to data: encrypt it in transit and at rest, log who accessed what, and assume that the cost of a breach of sensitive records is severe. In policing and health contexts, access control isn't IT hygiene — it's a core part of the public's trust.

Ethics and fairness

Beyond "are we allowed?" sits the harder question: "should we, and is it fair?" Data and models can encode and amplify the biases in the world that produced them — a model trained on biased historical data will faithfully reproduce that bias, now wearing the authority of "the algorithm". In government and health, where decisions touch real lives, that's not abstract: an unfair model or a misleading analysis can do real harm to real people.

So ethics belongs in the workflow, not a postscript: ask who could be harmed, check models for disparate impact across groups, be honest about limitations (the communication discipline), and keep a human accountable for consequential decisions. Being technically correct and being responsible are not the same thing, and the second is the higher bar.

Making it real

Principles only matter if they're operationalised. In practice that means assigning data stewards accountable for specific domains, maintaining a data catalogue (the inventory that makes data findable and documents its lineage — the practical face of FAIR), setting retention rules so data is kept no longer than needed, and defining the policies for quality, access, and classification. Governance is the unglamorous scaffolding that lets an organisation trust its own data.

Governing AI

As models drive more decisions, governance is extending to AI governance: managing the risks of the models themselves, not just the data. The themes are transparency (can you explain how a decision was reached?), accountability (who is responsible when it's wrong?), fairness (is the impact equitable?), and human oversight of consequential automated decisions. It's the ethics and reproducibility threads of this whole section, pointed at the model — and it's fast becoming a formal requirement, especially in the public sector.

Where it shows up in my work

The licence to operate

In government, governance is the licence to do the work at all. The sensitive data I handle comes with hard obligations, and the discipline on this page is what keeps the analysis both useful and defensible: lineage so every figure is traceable, least-privilege access and classification on sensitive records, privacy and de-identification handled properly, and a constant eye on fairness because the decisions affect people. Integrating sources like ABS, health, and other government data only works inside this framework.

FAIR is the part that ties my research background to my current work: making data findable, well-described, and reusable — without making it open — is exactly what responsible analytics in a sensitive setting requires. It's the quiet foundation under everything else in this section: the maths and models only earn trust when the data beneath them is governed well.

Refresh in 60 seconds

Reflects current data-governance and FAIR-principles guidance (the original FAIR paper in Scientific Data; government FAIR-readiness and Australian privacy practice) alongside hands-on government work.