
Glossary
Retrieval-Augmented Generation (RAG)
Injecting retrieved documents into an LLM prompt to ground outputs.
Definition
Retrieval-Augmented Generation is the practice of fetching relevant documents at query time and including them in the LLM's prompt, so the model's response is grounded in your data rather than its parametric memory. RAG typically combines a vector search index, a retrieval step, and a generation step.
Context
RAG is the workhorse pattern for enterprise generative AI. It avoids the cost and rigidity of fine-tuning, lets the same model serve many knowledge bases, and produces cited outputs that humans can verify. Modern RAG stacks include hybrid (semantic + keyword) retrieval, reranking, and increasingly agentic retrieval — where an agent decides what to search and how.
Related terms
Large Language Model (LLM)A neural network trained on massive text corpora to predict next tokens.Read Fine-TuningContinued training of a base model on task-specific data.Read Vector DatabaseA database optimized for semantic similarity search over embeddings.Read Agentic AIAI systems that autonomously execute multi-step tasks to reach a goal.Read