Retrieval-Augmented Generation. A technique that improves AI responses by first searching a knowledge base for relevant information, then feeding that context to the language model. Reduces hallucinations and keeps responses grounded in facts. The standard approach for building AI systems over private data.
RAG (Retrieval-Augmented Generation) is a technique that improves AI responses by first searching a knowledge base for relevant information, then feeding that information to the language model alongside the user's question. Instead of relying solely on what the model memorized during training, RAG grounds responses in actual source documents.
The typical RAG pipeline works like this: convert your documents into embeddings and store them in a vector database. When a user asks a question, convert the question to an embedding, find the most similar document chunks, and include them in the prompt as context. The model then generates a response based on this retrieved context rather than its training data alone. This dramatically reduces hallucinations and lets you work with information the model was never trained on.
RAG has become the standard pattern for building AI applications over private data — company knowledge bases, documentation, legal archives, medical records. It's popular because it doesn't require fine-tuning the model itself, which means you can update your knowledge base without retraining. The challenges are in retrieval quality (finding the right documents), chunking strategy (how to split documents), and prompt design (helping the model use the context effectively). Get these right and RAG performs remarkably well.
"We built a RAG system over our 50,000-page documentation library — employees can now ask questions in plain English and get answers with source citations."
A dense numerical representation of data (words, images, etc.
A database optimized for storing and searching high-dimensional vectors (embeddings).
Connecting an AI model's outputs to verified, factual information sources.
A mathematical function applied to a neuron's output that introduces non-linearity into the network.
An optimization algorithm that combines the best parts of two other methods — AdaGrad and RMSProp.
Artificial General Intelligence.
Browse our complete glossary or subscribe to our newsletter for the latest AI news and insights.