RAG (Retrieval-Augmented Generation)
RAG is a pattern where an AI system retrieves relevant context from a knowledge source before generating an answer, instead of relying only on the model's training data.
Also known as: retrieval augmented generation · RAG pipeline
RAG stands for Retrieval-Augmented Generation. The pattern: when a user asks something, the system first retrieves relevant context from a knowledge source (vector DB, search index, structured store), then augments the model’s prompt with that context, then generates the answer.
The whole point is to ground the answer in real, current, specific information instead of relying on whatever the model happened to memorize during training.
Why it exists
Frontier LLMs are good at language and bad at facts. They confidently produce convincing wrong answers when you ask about specific entities, recent events, or your private data. RAG is the workaround: don’t ask the model to know your business; give it the relevant slice and let it reason over that.
The standard pieces
- A knowledge base — your docs, your schema, your tickets, whatever the model needs to read.
- A retrieval step — typically vector embeddings + similarity search, sometimes hybrid with keyword search, sometimes structured queries.
- A prompt template — assembles the retrieved chunks alongside the user question.
- The LLM call — generates the answer using only what’s in the prompt.
- Citations — references back to the retrieved sources so the answer is auditable.
The “RAG pipeline” everyone talks about is just those five steps wired together.
RAG for AI data analytics
For an AI data analyst, RAG means retrieving the right pieces of the schema, the right metric definitions, and any business context docs before generating SQL. It’s necessary but not sufficient — you also need clarification logic (ask back when ambiguous), schema validation (make sure the SQL is legal), and result interpretation.
A pure “give the LLM the whole schema and hope” approach is not RAG; it’s just LLM-with-a-big-prompt, and the bigger the schema the worse it gets — the model loses track of which columns matter, exceeds context, and starts hallucinating join keys. Real RAG retrieves only the relevant subset.
How Datost uses RAG
Datost runs retrieval across three sources for every question: your warehouse schema (which tables, columns, types), your metric definitions (your docs about how revenue, churn, etc. are computed), and your business context (PRDs, runbooks, prior Slack threads you’ve connected). The retrieval result becomes part of the prompt that generates SQL. This grounding is most of the accuracy gap on BIRD-Interact — same frontier model, 33% alone vs 75% with the right context.
- Semantic Layer A semantic layer is a central definition store that maps human-readable business concepts (revenue, churn, MRR) to the underlying tables and SQL that compute them.
- Metric Definition A metric definition is the exact SQL or calculation that produces a business metric, plus the documented assumptions behind it.
- Text-to-SQL Text-to-SQL is the task of translating a natural-language question into a SQL query that runs against a database and returns the answer.