HomecaveauAIArchitecture
caveauAI Architecture

Under the hood: a corpus you can inspect.

caveauAI is a multi-stage retrieval architecture, not a chat wrapper. Eight stages between document and cited answer, three deployment topologies, and a model layer you can swap. Built for engineers and CTOs evaluating private RAG seriously.

Eight stages from document to cited answer.

Each stage is independently observable, tunable, and replaceable. Stage boundaries are the points where compliance, audit, and customization happen.

01
Ingest Documents enter the pipeline in raw form — PDFs, scanned images, audio recordings, plain text, or structured exports from email and SharePoint. Audio is routed through GPU-accelerated speech-to-text producing timestamped, speaker-segmented transcripts.
02
Normalize OCR for scanned material, structural parsing for native files, language detection, metadata extraction (author, date, document type). Output is a clean, machine-readable document object with provenance intact.
03
Chunk Documents are split at semantic boundaries — paragraphs, clauses, sections — not arbitrary character counts. Chunk size and overlap are tunable per corpus. Each chunk inherits parent-document metadata.
04
Embed Each chunk is encoded as a high-dimensional dense vector via a domain-appropriate embedding model. Embeddings capture semantic meaning, not keywords — enabling retrieval that survives paraphrase, translation, and synonym drift.
05
Index Vectors land in a per-tenant vector store, partitioned by corpus. Sparse keyword indices are maintained in parallel for hybrid retrieval. Metadata filters (date, author, document type) attach at index time.
06
Retrieve A query is embedded with the same model, then run as a hybrid search: dense vector similarity plus sparse keyword match. Metadata filters narrow scope. Top-k candidate chunks come back with similarity scores.
07
Rank Candidates are re-ranked by a cross-encoder model that scores query-chunk pairs directly. This second pass corrects for embedding-space limitations and surfaces the chunks that actually answer the question.
08
Cite A language model generates the answer grounded in the ranked chunks, with explicit citation back to the source passage. The user sees the answer and the evidence side by side.

Hybrid, filtered, re-ranked.

Pure vector search is fragile on long-tail queries. caveauAI combines three retrieval mechanisms to surface evidence that actually answers the question.

Hybrid

Dense + sparse retrieval

Vector similarity finds semantic matches even when wording differs. Keyword search catches exact identifiers, codes, and quoted phrases. Both run in parallel; results merge.

Filtered

Metadata-aware narrowing

Filter by date, author, document type, jurisdiction, client, or any custom field captured at ingest. Reduces noise on large corpora.

Re-ranked

Cross-encoder second pass

After initial retrieval, a smaller cross-encoder model scores query-chunk pairs to surface chunks that genuinely answer the question, not just chunks that match in vector space.

No single-vendor lock-in.

caveauAI is provider-agnostic at the model layer. Embeddings, retrieval re-ranker, and generation can each be swapped without rebuilding the corpus.

Open-weight

Self-hosted open models

Run on Blue Note Logic GPU infrastructure or your own. Common choices: Llama, Mistral, Qwen — selected per corpus characteristics.

Hosted API

OpenAI, Anthropic, Azure OpenAI

Routed via a unified gateway with key management, rate limiting, and request logging. Per-tenant policy on which providers are permitted.

BYO key

Your provider, your contract

Bring your own API key from any supported provider. Useful when you have an existing enterprise agreement or compliance posture.

Fine-tuning

Path to custom models

When retrieval alone is no longer enough, Blue Note Logic can scope fine-tuning on a customer corpus. Available on Sovereign deployments.

Three boundaries. One workspace.

The application, API surface, and tool behavior are identical across topologies. What changes is where the data sits and who operates the infrastructure.

SaaS

Shared multi-tenant

Fastest path to running. Blue Note Logic operates the full stack in EU bare-metal datacenters with per-tenant corpus isolation.

  • EU datacenter (Hetzner / equivalent)
  • Per-tenant vector store partition
  • Shared application tier
  • Standard SLA
  • Best for: pilots, SMB, advocacy
Sovereign

On-prem / air-gap

Full caveauAI stack installed inside your perimeter. No outbound dependency on Blue Note Logic infrastructure or public model APIs.

  • Air-gap capable
  • Open-weight models only (or BYO)
  • Customer-operated or BNL-managed
  • Fine-tuning supported
  • Best for: government, defense, critical infrastructure

Three integration paths.

Detailed API documentation lives on the all-features page. Short version below.

REST API

Query, chat, document CRUD

Standard REST endpoints for corpus management, retrieval, chat completion, and document lifecycle. API key auth with scoped permissions.

MCP

Model Context Protocol

Expose caveauAI corpora to MCP-compatible agents (Claude Desktop, Cursor, custom agents). Grounds external assistants in your private knowledge.

Workflows

Webhooks + scheduled jobs

Agent Builder chains retrieval, analysis, and delivery into repeatable pipelines. Trigger on incoming email, file drop, or schedule.

Full integration docs

Need to see this against your stack?

Blue Note Logic engineers will walk through caveauAI architecture against your existing data infrastructure, identity model, and compliance posture.

Technical review Trust & privacy posture
AI Chat — Beta Testing, Online Soon