Glossary · Column Lineage

Column Lineage

Column lineage traces, for any column in any table, the chain of upstream columns and transformations that produced it.

Also known as: data lineage · column-level lineage

Column lineage answers a question every analyst has asked at 11pm: “where does this number actually come from?” For any column in any downstream table — say, analytics.revenue_by_account.mrr — column lineage traces the chain of SQL transformations all the way back to the source columns (stripe_invoices.amount_paid, internal_credits.refund_amount, etc.) that ultimately produced it.

Table-level lineage is the cheaper, less useful sibling: it tells you “this table depends on those tables” without telling you which columns. Column-level lineage is what you actually need when debugging.

Why it matters

Three real situations where column lineage is the difference between a quick fix and a multi-day investigation:

  1. A dashboard number looks wrong. You need to find every transformation between the source data and the dashboard so you can identify where the bug entered. Without column lineage, you grep through SQL files for hours.
  2. A source column changes type. You need to know every downstream column that touches it before you ship the migration. Without lineage, the change ships and three dashboards silently break.
  3. An auditor or compliance reviewer asks how a regulated number is computed. You need a defensible trace.

dbt, SQLMesh, and modern modeling tools generate column lineage as a byproduct of compilation. The lineage is sitting right there in your project; the question is whether anyone reads it.

What column lineage looks like in practice

A graph. Source columns at the leaves, downstream columns at the root, transformations on the edges. You click on analytics.revenue_by_account.mrr and see the tree of upstream columns and the SQL on each edge.

For audit and debugging, the graph beats any amount of documentation.

How Datost uses lineage

When Datost generates SQL, the answer ships with the lineage of every column referenced. You see not just the SQL, but where each column upstream came from. When a number is challenged, you can show the chain. When a metric is rebuilt in the semantic layer, the question “what depends on this column” is one click away, not a half-day investigation.