caveauAI Architecture

Under the hood: a corpus you can inspect.

caveauAI is a multi-stage retrieval architecture, not a chat wrapper. Eight stages between document and cited answer, three deployment topologies, and a model layer you can swap. Built for engineers and CTOs evaluating private RAG seriously.

API documentation Technical review

Ingest Documents enter the pipeline in raw form — PDFs, scanned images, audio recordings, plain text, or structured exports from email and SharePoint. Audio is routed through GPU-accelerated speech-to-text producing timestamped, speaker-segmented transcripts.

Normalize OCR for scanned material, structural parsing for native files, language detection, metadata extraction (author, date, document type). Output is a clean, machine-readable document object with provenance intact.

Chunk Documents are split at semantic boundaries — paragraphs, clauses, sections — not arbitrary character counts. Chunk size and overlap are tunable per corpus. Each chunk inherits parent-document metadata.

Embed Each chunk is encoded as a high-dimensional dense vector via a domain-appropriate embedding model. Embeddings capture semantic meaning, not keywords — enabling retrieval that survives paraphrase, translation, and synonym drift.

Index Vectors land in a per-tenant vector store, partitioned by corpus. Sparse keyword indices are maintained in parallel for hybrid retrieval. Metadata filters (date, author, document type) attach at index time.

Retrieve A query is embedded with the same model, then run as a hybrid search: dense vector similarity plus sparse keyword match. Metadata filters narrow scope. Top-k candidate chunks come back with similarity scores.

Rank Candidates are re-ranked by a cross-encoder model that scores query-chunk pairs directly. This second pass corrects for embedding-space limitations and surfaces the chunks that actually answer the question.

Cite A language model generates the answer grounded in the ranked chunks, with explicit citation back to the source passage. The user sees the answer and the evidence side by side.

Hybrid

Dense + sparse retrieval

Vector similarity finds semantic matches even when wording differs. Keyword search catches exact identifiers, codes, and quoted phrases. Both run in parallel; results merge.

Filtered

Metadata-aware narrowing

Filter by date, author, document type, jurisdiction, client, or any custom field captured at ingest. Reduces noise on large corpora.

Re-ranked

Cross-encoder second pass

After initial retrieval, a smaller cross-encoder model scores query-chunk pairs to surface chunks that genuinely answer the question, not just chunks that match in vector space.

Open-weight

Self-hosted open models

Run on Blue Note Logic GPU infrastructure or your own. Common choices: Llama, Mistral, Qwen — selected per corpus characteristics.

Hosted API

OpenAI, Anthropic, Azure OpenAI

Routed via a unified gateway with key management, rate limiting, and request logging. Per-tenant policy on which providers are permitted.

BYO key

Your provider, your contract

Bring your own API key from any supported provider. Useful when you have an existing enterprise agreement or compliance posture.

Fine-tuning

Path to custom models

When retrieval alone is no longer enough, Blue Note Logic can scope fine-tuning on a customer corpus. Available on Sovereign deployments.

SaaS

Shared multi-tenant

Fastest path to running. Blue Note Logic operates the full stack in EU bare-metal datacenters with per-tenant corpus isolation.

EU datacenter (Hetzner / equivalent)
Per-tenant vector store partition
Shared application tier
Standard SLA
Best for: pilots, SMB, advocacy

Private Cloud

Dedicated single-tenant

Single-tenant deployment on dedicated infrastructure — Blue Note Logic bare-metal or your VPC (AWS, Azure, GCP).

Dedicated compute, dedicated storage
Customer-controlled retention policy
Custom embedding/model selection
Enhanced SLA + dedicated support
Best for: mid-market, regulated industries

Sovereign

On-prem / air-gap

Full caveauAI stack installed inside your perimeter. No outbound dependency on Blue Note Logic infrastructure or public model APIs.

Air-gap capable
Open-weight models only (or BYO)
Customer-operated or BNL-managed
Fine-tuning supported
Best for: government, defense, critical infrastructure

REST API

Query, chat, document CRUD

Standard REST endpoints for corpus management, retrieval, chat completion, and document lifecycle. API key auth with scoped permissions.

MCP

Model Context Protocol

Expose caveauAI corpora to MCP-compatible agents (Claude Desktop, Cursor, custom agents). Grounds external assistants in your private knowledge.

Workflows

Webhooks + scheduled jobs

Agent Builder chains retrieval, analysis, and delivery into repeatable pipelines. Trigger on incoming email, file drop, or schedule.

Full integration docs

Need to see this against your stack?

Blue Note Logic engineers will walk through caveauAI architecture against your existing data infrastructure, identity model, and compliance posture.

Technical review Trust & privacy posture