RAG (Retrieval-Augmented Generation)

RAG is a pattern where an AI system retrieves relevant context from a knowledge source before generating an answer, instead of relying only on the model's training data.

RAG stands for Retrieval-Augmented Generation. The pattern: when a user asks something, the system first retrieves relevant context from a knowledge source (vector DB, search index, structured store), then augments the model’s prompt with that context, then generates the answer.

The whole point is to ground the answer in real, current, specific information instead of relying on whatever the model happened to memorize during training.

Why it exists

Frontier LLMs are good at language and bad at facts. They confidently produce convincing wrong answers when you ask about specific entities, recent events, or your private data. RAG is the workaround: don’t ask the model to know your business; give it the relevant slice and let it reason over that.

The standard pieces

A knowledge base — your docs, your schema, your tickets, whatever the model needs to read.
A retrieval step — typically vector embeddings + similarity search, sometimes hybrid with keyword search, sometimes structured queries.
A prompt template — assembles the retrieved chunks alongside the user question.
The LLM call — generates the answer using only what’s in the prompt.
Citations — references back to the retrieved sources so the answer is auditable.

The “RAG pipeline” everyone talks about is just those five steps wired together.

RAG for AI data analytics

For an AI data analyst, RAG means retrieving the right pieces of the schema, the right metric definitions, and any business context docs before generating SQL — in other words, how to give an LLM the business context to write correct SQL. It’s necessary but not sufficient. You also need clarification logic (ask back when ambiguous), schema validation (make sure the SQL is legal), and result interpretation.

A pure “give the LLM the whole schema and hope” approach is not RAG. It’s just LLM-with-a-big-prompt, and the bigger the schema the worse it gets: the model loses track of which columns matter, exceeds context, and hallucinates join keys. Real RAG retrieves only the relevant subset.

How Datost uses RAG

Datost runs retrieval across three sources for every question: your warehouse schema (which tables, columns, types), your metric definitions (your docs about how revenue, churn, etc. are computed), and your business context (PRDs, runbooks, prior Slack threads you’ve connected). The retrieval result becomes part of the prompt that generates SQL. This grounding is most of the accuracy gap on BIRD-Interact: the same frontier model scores 33% alone and 75% with the right context.