Case study — Flagship

Signal

Building compliance into the request path — a governance layer for AI-assisted government data.

Live demo ↗View on GitHub ↗

Status

Live · v1.14

Tests

128 green

Jurisdictions

SA + NYC

Open datasets

~1,900

PythonFastAPIPydantic v2NumPySciPyModalDockerLLMEU AI ActDTA v2.0GitHub Actions

The problem

From 15 June 2026, AI governance becomes mandatory across the Australian Public Service. The Digital Transformation Agency's Policy for the responsible use of AI in government (Version 2.0) requires every agency to designate accountable officials, keep a register of in-scope AI use cases, publish AI transparency statements, and run AI use-case impact assessments — with the mandatory impact assessments following by 15 December 2026. The EU AI Act adds risk classification and traceability on top, and the Privacy Act 1988 (Cth) reforms add a disclosure duty for automated decisions.

The rules are easy to agree with and hard to actually meet. Most teams treat the record as paperwork — something you assemble after the fact, when an auditor asks. That breaks down the moment you look closely, because the facts you need are freshest at the instant the decision is made and they decay quickly. Which model version answered? What was the exact data window? Did anyone actually check the spike before it went out? Reconstruct that a month later and you are guessing.

What it does

Signal is a small product I built to test a different idea: make the compliance record a side effect of answering, not a separate task. If the system cannot answer without writing the record, the record can never be missing.

You ask it a question — how theft is trending in Adelaide over the last eighteen months — and it returns a plain-language summary backed by real numbers: the trend direction and whether it is statistically significant, the month-on-month and year-on-year change, the seasonal pattern, a short forecast, the top offence categories, and any months unusual enough to flag for review. It runs over two jurisdictions — South Australia Police and the New York City Police Department — on the same governed path, and the same portal behind the SA figures publishes around 1,900 open datasets that Signal can search, trend, and map. Every lookup is governed.

Every answer carries a decision id. That id resolves to a full audit entry through a public endpoint, so anyone can trace any answer back to the model that produced it, the data that informed it, and whether a human needs to look. The audit trail is not a hidden log file — it is part of the product you can see and click.

The design

The heart of it is one module, a decision log: a typed schema for an AI-assisted decision and an append-only writer that puts one decision per line in a plain text file. Nothing exotic — you can read it with grep or load it straight into pandas.

The important decision was where to put the logging. In Signal the analyst physically cannot return an answer without first writing the audit entry. The two steps are welded together in the request path. There is no code path that answers a user and forgets to log, because answering is logging. Compliance stops being a discipline people have to remember and becomes a property of the system.

Checking the AI, not just logging it

The summaries are phrased by a language model from the computed figures, never from the raw data. That raises the question an auditor asks first: how do you know the model did not make a number up? Signal checks every summary against the statistics before it reaches anyone. The check is deterministic and runs without calling the model again: every figure in the summary has to appear in the computed numbers, and the sentence describing the trend cannot contradict the computed direction.

A summary that fails is rejected, the plain deterministic version is sent in its place, and the rejection is written to the same audit log. Each answer carries a faithfulness score you can see on the result and in the audit trail, and a live model card reports the average score and how often the model was overruled. The model is allowed to phrase the answer. It is never trusted to invent one.

Statistics worth trusting

A percentage change makes a good headline and a poor conclusion. Theft in a suburb can be down ten per cent on last month and still be doing nothing unusual, because monthly counts wander on their own. So the analyst does not stop at the percentage — it asks whether the movement is real.

It runs a Mann-Kendall test, the standard way to check for a trend in a monthly series, which assumes nothing about the data being neatly shaped and returns a p-value — so the answer can say “the decline is statistically significant” or “this is within normal variation” instead of leaving the reader to guess. A Sen slope estimates how steep the trend is from the median of every pairwise slope, so one odd month cannot tilt the line, and it comes with a confidence interval. A seasonal decomposition separates the recurring swing from the underlying trend, marked as indicative rather than settled when there is less than two full years of data, and a short forecast projects the next few months with a widening prediction interval.

None of this is decoration. Every figure is computed before the language model phrases anything, every one is checked by the faithfulness test, and every one is written to the audit log. On the dashboard the same numbers appear as a forecast cone, a month-by-year seasonal heat map, and a short statistical reading beside the answer.

Mapping to the DTA policy

The DTA policy does not ask for free text. It asks for specific artefacts, and Signal produces each one from the same log rather than as separate paperwork.

Accountable official & use-case owner

Each decision records who is accountable — the reviewer, the officer, the agency. Configured per deployment, so a real agency sees its own names.

The log is the register. A live endpoint rolls it up by use case: how many decisions, the risk tier, the share that needed human review, and the accountable reviewers. Never out of date, because it is computed from the decisions the product is making.

AI transparency statement

Generated straight from the log — what AI is in use, for what, on what data, the risk class, the human oversight, and how the public can trace any answer. Generated, not hand-written, so it cannot drift from what the system actually does.

AI use-case impact assessment

Mandatory by December 2026; Signal generates it now, one per use case. Who is affected, the risks, the safeguards, the fairness considerations, the residual risk — citing the live faithfulness score and human-review rate, not boilerplate.

For the EU AI Act, a risk-tier field marks each decision as minimal, limited, high, or unacceptable, flagging high-risk uses for extra oversight. Three further rules sit in the analyst itself: it only ever sees aggregates, so no personal record enters the system; a statistically unusual month sets the human-review flag automatically; and every comparison between regions carries a plain fairness note — these are raw counts, not rates, and a gap can reflect population, reporting, or policing as much as real offending.

Why this data

I work as a data analyst at South Australia Police, so I chose data from that same domain on purpose. It keeps the governance question concrete. Crime statistics are exactly the kind of sensitive, public-interest data where “how was this AI-assisted answer reached” is a real question with real consequences, not a hypothetical.

The data also taught me something. SA Police changed their offence classification partway through the period, so “theft and related offences” became simply “theft”. A trend that crossed that change would have fractured into two unrelated series. Handling it meant building a small harmonisation layer that maps both the old and new vocabularies onto one stable scheme, applied the same way to live data and the bundled snapshot. That quiet taxonomy work is most of what real public-sector data engineering actually is.

What I would do next

The honest limit is full agency hardening: authentication, a durable audit store, and a real owner behind the accountable-official field rather than a configurable placeholder. The explorer also samples very large datasets at a row cap rather than scanning them whole, which the product states plainly in the result. These are the next pieces of work rather than things already done.

The takeaway

Do not bolt governance on at the end and hope people fill in the form. Wire it into the path the work already takes, so the record writes itself, and let the register and the transparency statement fall out of that same record. A compliance trail you have to remember to keep is one you will eventually forget. One the system cannot operate without is one you can actually trust — and one you can show a regulator on the day the rules commence.

Signal is open source and live.

Live demo ↗View on GitHub ↗