Glossary

Retrieval-Augmented Generation (RAG)

Injecting retrieved documents into an LLM prompt to ground outputs.

Definition

Retrieval-Augmented Generation is the practice of fetching relevant documents at query time and including them in the LLM's prompt, so the model's response is grounded in your data rather than its parametric memory. RAG typically combines a vector search index, a retrieval step, and a generation step.

Context

RAG is the workhorse pattern for enterprise generative AI. It avoids the cost and rigidity of fine-tuning, lets the same model serve many knowledge bases, and produces cited outputs that humans can verify. Modern RAG stacks include hybrid (semantic + keyword) retrieval, reranking, and increasingly agentic retrieval — where an agent decides what to search and how.