Semantic Layer — Datost Glossary

A semantic layer is a central definition store that maps human-readable business concepts (revenue, churn, MRR) to the underlying tables and SQL that compute them.

A semantic layer is the place where your team agrees, once, what each business concept actually means in SQL. “Revenue” might join stripe_invoices.amount_paid with a refund correction from internal_credits, exclude tax, and convert to USD. “Active user” might mean an account with at least one session in the last 7 days, excluding internal Datost employees. The semantic layer encodes those decisions so every dashboard, every notebook, every AI analyst computes them the same way.

Why it matters

Without a semantic layer, the definition lives in three places at once: in the analyst’s head, in whatever SQL was last shipped in a dashboard, and in a Notion doc nobody re-reads. Three answers diverge. Someone in marketing presents “MRR is up 8%” while finance is presenting “MRR is up 3%” the same week. Both are technically computing MRR, just differently.

With a semantic layer, the metric is defined once. Tools call it by name. The definition is versioned, owned, and reviewable.

What it actually is, technically

In practice, a semantic layer is some combination of: a metrics framework (dbt Metrics, Cube, MetricFlow), a tool’s internal modeling layer (Looker’s LookML, Sigma’s data models), or a structured set of metric-definition docs the AI analyst reads at query time.

The shape varies, but the function is the same. It turns “what is MRR?” into a deterministic SQL recipe.

How Datost uses the semantic layer

Datost grounds every answer in your metric definitions before generating SQL — how grounding turns business context into correct SQL is the mechanism behind this. Upload your “what counts as MRR” doc and your “how do we handle refunds” doc, and Datost retrieves the relevant ones for each question and cites them in the response. When two definitions conflict, Datost asks back instead of guessing. You get the same number every time, with a paper trail. This is the bulk of the accuracy gap on benchmarks like BIRD-Interact.