Text-to-SQL
Text-to-SQL is the task of translating a natural-language question into a SQL query that runs against a database and returns the answer.
Also known as: natural language to SQL · NL2SQL · AI SQL generation
Text-to-SQL is the task of taking a question written in plain English — “what was our gross margin by plan last quarter?” — and producing the SQL query that, when run against your warehouse, returns the right answer. It is one of the most heavily studied corners of applied AI because the value is obvious: most people who need an answer from a database cannot or should not write SQL themselves.
Why it’s harder than it looks
The model has to do four things in sequence. First, understand the question, including the implicit references (“our,” “last quarter”) that depend on context the user did not state. Second, locate the relevant tables and columns in a schema that may have hundreds or thousands of objects with inconsistent naming. Third, write SQL that joins them correctly. Fourth, know when the question is too ambiguous and ask back instead of guessing.
That fourth step is where most text-to-SQL systems quietly fail. A frontier LLM with no extra structure will write a SQL query that is syntactically valid and looks plausible but uses the wrong column or makes an undocumented assumption. The system gets the answer wrong without anyone noticing.
The current accuracy bar
The hardest public benchmark is BIRD-Interact, accepted as an oral at ICLR 2026. It pits text-to-SQL systems against 22 deliberately ugly real-world Postgres databases with 600 ambiguous business questions. Frontier models like Claude Opus 4.6 score around 33% when used directly. Production systems that add schema retrieval, metric grounding, and clarification logic score much higher. Datost scores 75.2% on top of the same model. See the benchmark writeup for methodology.
How Datost approaches it
Datost treats text-to-SQL as a system problem, not a model problem. The same frontier model that scores 33% alone scores 75% inside Datost because the surrounding system handles the four steps above: schema retrieval against your real warehouse, grounding in your team’s metric definitions, clarification when the question is ambiguous, and validation before the query runs. Every answer ships with the SQL attached so the analyst can audit and the next person can build on it.
- Semantic Layer A semantic layer is a central definition store that maps human-readable business concepts (revenue, churn, MRR) to the underlying tables and SQL that compute them.
- Metric Definition A metric definition is the exact SQL or calculation that produces a business metric, plus the documented assumptions behind it.
- BIRD-Interact BIRD-Interact is a text-to-SQL benchmark of 600 deliberately ambiguous business questions across 22 realistic Postgres databases, published at ICLR 2026.