HomeWikiRAG (Retrieval-Augmented Generation)
AI & Machine Learning

RAG (Retrieval-Augmented Generation)

AI technique that combines document retrieval with language model generation to produce citation-backed answers grounded in specific source material.

RAG (Retrieval-Augmented Generation) is an AI technique that enhances language model responses by first retrieving relevant documents from a knowledge base, then using those documents as context for generating answers. This produces responses that are grounded in specific source material and can provide citations.

How RAG Works

  1. Indexing — Documents are chunked into passages, converted to vector embeddings, and stored in a vector database
  2. Retrieval — When a query arrives, it is embedded and matched against stored vectors to find the most relevant passages
  3. Generation — The retrieved passages are provided as context to a language model, which generates an answer with citations to the source material

Why RAG Matters

Without RAG, language models can only draw on their training data — which may be outdated, incomplete, or lacking domain-specific knowledge. RAG lets you augment the model with your own documents: internal reports, client communications, technical manuals, regulatory texts. The model generates answers using your data, not generic internet knowledge.

RAG in CaveauCRM

CaveauAI uses RAG to index every client interaction — support tickets, invoices, meetings, deals — into a searchable knowledge base. Users can ask natural language questions about any client and receive cited answers referencing specific interactions. This transforms scattered data across five platforms into a single queryable intelligence layer.

Key Components

  • Chunking — Splitting documents into retrievable passages (typically 256-1024 tokens)
  • Embeddings — Vector representations of text that capture semantic meaning
  • Vector database — Storage optimised for similarity search across embedding vectors
  • Retrieval — Finding the top-k most relevant chunks for a given query
  • Context assembly — Combining retrieved chunks with metadata and provenance information
  • Generation — Language model produces a response using retrieved context
AI Chat — Beta Testing, Online Soon