Text-to-SQL — Datost Glossary

Text-to-SQL is the task of translating a natural-language question into a SQL query that runs against a database and returns the answer.

Text-to-SQL is the task of taking a question written in plain English (“what was our gross margin by plan last quarter?”) and producing the SQL query that, when run against your warehouse, returns the right answer. It is one of the most heavily studied corners of applied AI because the value is obvious: most people who need an answer from a database cannot or should not write SQL themselves.

Why it’s harder than it looks

The model has to do four things in sequence. First, understand the question, including the implicit references (“our,” “last quarter”) that depend on context the user did not state. Second, locate the relevant tables and columns in a schema that may have hundreds or thousands of objects with inconsistent naming. Third, write SQL that joins them correctly. Fourth, know when the question is too ambiguous and ask back instead of guessing.

That fourth step is where most text-to-SQL systems quietly fail. A frontier LLM with no extra structure will write a SQL query that is syntactically valid and looks plausible but uses the wrong column or makes an undocumented assumption. The system gets the answer wrong without anyone noticing.

The current accuracy bar

If you want the full picture of how accurate text-to-SQL really is, the public benchmarks are the place to start. The hardest one is BIRD-Interact, accepted as an oral at ICLR 2026. It pits text-to-SQL systems against 22 deliberately ugly real-world Postgres databases with 600 ambiguous business questions. Frontier models like Claude Opus 4.6 score around 33% when used directly. Production systems that add schema retrieval, metric grounding, and clarification logic score much higher. Datost scores 75.2% on top of the same model. See the benchmark writeup for methodology.

How Datost approaches it

Datost treats text-to-SQL as a system problem, not a model problem. The same frontier model that scores 33% alone scores 75% inside Datost because the surrounding system handles the four steps above: schema retrieval against your real warehouse, grounding in your team’s metric definitions, clarification when the question is ambiguous, and validation before the query runs. Every answer ships with the SQL attached so the analyst can audit and the next person can build on it.