RAG (Retrieval-Augmented Generation)
AI technique that combines document retrieval with language model generation to produce citation-backed answers grounded in specific source material.
RAG (Retrieval-Augmented Generation) is an AI technique that enhances language model responses by first retrieving relevant documents from a knowledge base, then using those documents as context for generating answers. This produces responses that are grounded in specific source material and can provide citations.
How RAG Works
- Indexing — Documents are chunked into passages, converted to vector embeddings, and stored in a vector database
- Retrieval — When a query arrives, it is embedded and matched against stored vectors to find the most relevant passages
- Generation — The retrieved passages are provided as context to a language model, which generates an answer with citations to the source material
Why RAG Matters
Without RAG, language models can only draw on their training data — which may be outdated, incomplete, or lacking domain-specific knowledge. RAG lets you augment the model with your own documents: internal reports, client communications, technical manuals, regulatory texts. The model generates answers using your data, not generic internet knowledge.
RAG in CaveauCRM
CaveauAI uses RAG to index every client interaction — support tickets, invoices, meetings, deals — into a searchable knowledge base. Users can ask natural language questions about any client and receive cited answers referencing specific interactions. This transforms scattered data across five platforms into a single queryable intelligence layer.
Key Components
- Chunking — Splitting documents into retrievable passages (typically 256-1024 tokens)
- Embeddings — Vector representations of text that capture semantic meaning
- Vector database — Storage optimised for similarity search across embedding vectors
- Retrieval — Finding the top-k most relevant chunks for a given query
- Context assembly — Combining retrieved chunks with metadata and provenance information
- Generation — Language model produces a response using retrieved context