A Single Source of Truth for Metric Definitions: Stop Dashboard Discrepancies
When MRR is up 8% in one dashboard and 3% in another, the problem isn't the data, it's that the definition lives in three places. Here's how to fix it.
Dashboards disagree because the definition of each metric lives in more than one place, so each tool computes it differently. The fix is a single governed source of truth for metric definitions that every dashboard, query, and AI analyst reads from, instead of each one carrying its own private copy of the SQL.
This is the operational, problem-aware version of the metric definition concept. If you want the precise definition of the term, start there. This post is about the meeting where finance says MRR is up 3% and growth says it’s up 8% the same week, and how you make that meeting stop happening.
Why every dashboard shows a different number
The cause is almost never bad data. The warehouse is usually fine. The problem is that “MRR” is not one calculation, it’s a dozen small decisions, and different people made them at different times in different tools.
Take monthly recurring revenue. To compute it you have to decide:
- Do annual plans get divided by 12, or counted in the month they bill?
- Do you include or exclude one-time setup fees and overage charges?
- How do refunds and credits reduce it, and in which period?
- Do you convert foreign currency at the transaction rate or a fixed monthly rate?
- Do you exclude internal test accounts and your own employees?
Each of those is a fork in the road. The growth dashboard took one path eighteen months ago when an analyst shipped it under deadline. The finance model took a different path because it has to tie out to the books. Both are “MRR.” Both are running real SQL against the same warehouse. They will never agree, because they were never the same calculation.
You can see the same divergence in a single number computed two ways:
-- Growth's MRR: counts annual plans in full the month they bill
select date_trunc('month', billed_at) as month,
sum(amount) as mrr
from subscriptions
where status = 'active'
group by 1;
-- Finance's MRR: normalizes annual to monthly, excludes setup fees and test accounts
select date_trunc('month', period_start) as month,
sum(case when s."interval" = 'year' then amount / 12.0 else amount end) as mrr
from subscriptions s
join accounts a on a.id = s.account_id
where s.status = 'active'
and s.line_type = 'recurring'
and a.is_internal = false
group by 1;
Neither is wrong in isolation. They’re answering slightly different questions while using the same word. Multiply this across churn, conversion, active users, and “qualified lead” and you get an organization that argues about numbers instead of acting on them.
What a single source of truth actually is
A single source of truth is not a dashboard, and it’s not a spreadsheet everyone agreed to use. It’s a governed store where each metric is defined exactly once, in a form that tools call by name. When the semantic layer holds the definition, a dashboard asks for mrr and gets back the one canonical calculation, joins and filters and currency handling included. Nobody re-implements it.
Three properties separate a real source of truth from a folder of SQL snippets.
Defined once, referenced everywhere. The calculation exists in one place. Every BI tool, notebook, scheduled report, and AI analyst resolves the metric through that place. There is no second copy to drift.
Owned. Every metric has a named owner and a last-reviewed date. “MRR” belongs to finance, “activation rate” belongs to growth. When someone wants to change what counts as a refund, there’s a person who signs off, not a Slack thread that scrolls away.
Versioned and reviewable. A change to a definition is a change to the business’s reported reality, so it should go through review like code does. This is exactly what the modern metrics tools enforce. dbt’s Semantic Layer, powered by MetricFlow, keeps metric definitions as version-controlled YAML reviewed in pull requests and tested in CI before anything downstream sees the change. Cube takes the headless route, where one definition sits behind several query APIs so every consumer gets the identical computation. The shape differs by vendor, but the discipline is the same: one definition, owned, under version control.
Definition ownership is a people problem, not a tooling problem
Tools make a source of truth possible. They don’t make it true. The failure mode is buying a semantic layer and then letting analysts keep writing ad hoc MRR queries in notebooks because it’s faster than looking up the canonical one.
The thing that actually works is assigning ownership and routing changes through the owner. Finance owns revenue metrics. Growth owns funnel metrics. A change request to “active user” goes to the growth owner, who decides, documents the reason, and dates the decision. The point isn’t bureaucracy. The point is that when two numbers disagree six months from now, you can read the definition and the change history and know which one is canonical, instead of relitigating it from memory.
A useful tell: if your team has a recurring meeting whose real purpose is reconciling numbers, you don’t have a source of truth yet. You have several sources and a standing negotiation.
Why AI analysts must ground in these definitions
This is where the stakes jump. A frontier model can write syntactically perfect, plausible-looking SQL for “what’s our MRR this quarter” without knowing a single one of your forks in the road. It will guess that annual plans divide by 12, or guess that they don’t. It will join naively against your subscriptions table and hand you a number that is directionally close and quietly wrong. Worse than a wrong number is a confident wrong number with clean-looking SQL attached.
An AI analyst that’s worth trusting has to do what a careful human does: read the canonical definition before writing the query. This is the grounding problem, and it’s the entire reason a source of truth has to be machine-readable, not just a Notion page humans skim. The model retrieves the definition for the metric in question, uses it in the generated SQL, and shows you which definition it used. If two definitions conflict, it asks which one you mean rather than picking silently. The same logic applies to any conversational analytics or text-to-SQL system that runs against your warehouse.
The accuracy gap here is large and measurable. On BIRD-Interact, a benchmark built specifically around ambiguous business questions and loaded metric-definition documents, a frontier model used directly scores around 33%. The same model with retrieval and grounding on top scores 75.2%. The difference is almost entirely the source of truth: whether the system reads the definition first or guesses. We collect the broader picture in text-to-SQL accuracy benchmarks.
How Datost handles this
Datost treats your metric definitions as the source of truth and binds every answer to them. You upload your definitions wherever they live, Notion pages, Markdown files, a dbt project, a docs site, and Datost retrieves the relevant one for each question, uses it in the generated SQL, and cites it in the answer so you can audit the path from question to number.
When two definitions conflict, the “MRR per finance” versus “MRR per growth” problem, Datost asks which one to use instead of choosing for you. It joins across the warehouse, CRM, billing, and product analytics in a single query, all of it computed against the same definitions, so the number on the proactive Slack post matches the number in the dashboard matches the number an analyst would get by hand. Your team owns the truth. Datost is the interface that keeps every answer faithful to it.
See why grounding is the whole game for the reasoning behind this design, the feature breakdown for what ships with it, and the comparisons if you’re weighing it against your current BI stack.