/knowledge/applied-data-science
Applied Data Science
The page that ties the rest together. Every other topic here is a tool; this is how they fit into a real project — from a vague question to a decision someone acts on.
- Studied
- Applied Data ScienceBachelor of Science · Data Science core (79)
- When
- UniMelb, 2019–2022
- Applied in
- Every project, end to end
- Read / Refreshed
- ~15 min read2026-06-25
Every other page in this section teaches a tool — the maths, the statistics, the models, the systems. Applied data science is the page about how those tools fit together into an actual project: the messy, end-to-end journey from a half-formed business question to a decision someone makes because of your work. The tools are necessary; knowing how to run the project around them is what makes a data scientist effective.
The single most important idea here — and the one that's genuinely mine — is problem-first, not model-first. Start from the decision that needs making and the smallest amount of data to make it well, then reach for the right tool rather than the fashionable one. This page is the synthesis of everything else, organised around that principle.
01
Decisions, not models
It's easy to think data science is about building models. It isn't — it's about improving decisions with data, and a model is just one possible means to that end. A great many problems are solved with a clear chart, a well-framed metric, or a simple query, no model in sight. Mistaking the tool (modelling) for the goal (a better decision) is the most common and most expensive error in the field.
So applied data science is judged by impact, not sophistication. A simple analysis that changes what someone does beats an elegant model that sits unused. That reframing — from "what can I build?" to "what decision can I improve, and what's the least I need to do it?" — is the whole mindset.
02
The project lifecycle
Real projects follow a recognisable arc, captured by frameworks like CRISP-DM (Cross-Industry Standard Process for Data Mining). Its six phases are a useful map — as long as you remember the most important thing about them: it's a loop, not a line. You constantly circle back as what you learn in one phase reshapes an earlier one.
03
Framing the right question
The first phase decides whether the project succeeds, and it has nothing to do with code. Business understanding means translating a vague ask ("can we use AI here?") into a precise, answerable question tied to a decision: what choice will change based on the answer, what would "good" look like, and what's the simplest result that would be useful?
This is where most projects quietly fail — not in the modelling, but in solving the wrong problem precisely. A brilliant answer to the wrong question is worth nothing, so the discipline is to push back, clarify, and reframe before touching the data. Get this right and the rest is execution; get it wrong and no amount of technical skill saves you.
04
Data and modelling
The middle of the project is the craft the other pages cover — and applied data science is mostly about doing them in the right order and not skipping the unglamorous parts:
- Understand & prepare the data — explore it, then clean and shape it. This is the data processing work, and it's still the bulk of the effort; the data usually lives in a database you query with SQL.
- Model — pick the simplest method that fits the question, whether that's a regression, a machine-learning model, or just a well-chosen statistic. Start with a baseline.
- Evaluate — honestly, on held-out data, with a metric that matches the real-world cost of being wrong (the lesson from the statistics and ML pages). And evaluate against the decision, not just the leaderboard.
The applied skill isn't knowing every algorithm — it's choosing the least complex one that answers the question, and resisting the pull to over-engineer.
05
Deployment and monitoring
A result that never leaves your laptop changes nothing. Deployment is putting the work where it makes a difference — a dashboard a stakeholder uses, a report in a decision meeting, a model wired into a system. This is often the hardest, least-taught part, and where data science meets real engineering.
And deployment isn't the end, because a model is a product, not a deliverable. The world changes, so the data feeding the model drifts away from what it was trained on, and performance silently decays. So you monitor it in production and retrain when it slips — the data-drift lesson from the ML side. The job continues long after the first version ships.
06
Communicating the result
The most under-valued skill in the whole pipeline: a correct analysis nobody understands or trusts has zero impact. Communication — translating technical findings into a clear story a decision-maker can act on — is what converts good analysis into a good decision. It's important enough to have its own page (Science Communication), but it belongs in the lifecycle too: you should be thinking about how you'll explain the result from the very first phase, because it shapes what's worth doing.
07
Ethics and reproducibility
Working with data carries responsibility, and two threads run through every phase. Reproducibility means the whole path from raw data to result is code anyone can re-run to get the same answer — the standard from the data processing page, and what makes work auditable and trustworthy. Ethics means taking seriously the bias, privacy, fairness, and consequences of what you build — a model trained on biased data entrenches that bias, and in government and health work the stakes are real people. These aren't a final checklist; they're constraints you carry from the first question to the last deployment.
08
Problem-first
Tie it all together and you get a philosophy, not just a process. Problem-first, not model-first: begin with the decision, find the minimum data and the simplest method to make it well, and value impact over sophistication. It's what lets the same person be useful in a research lab, a government intelligence team, and an engineering project — because the framework is constant even as the tools change.
09
Where it shows up in my work
10