Vector Database
A specialised database designed to store, index, and search high-dimensional vector embeddings, enabling semantic similarity search at scale.
What is a Vector Database?
A vector database is a storage system optimised for high-dimensional vectors—numerical representations of data (text, images, audio) produced by machine learning models. Unlike traditional databases that match on exact values, vector databases perform approximate nearest neighbour (ANN) searches to find items that are semantically similar to a query.
When you convert a sentence like “How do I reset my password?” into a vector embedding, a vector database can instantly find similar questions like “I forgot my login credentials”—even though the words are completely different.
Key Capabilities
- Semantic search — Find content by meaning, not just keywords
- Hybrid search — Combine vector similarity with traditional metadata filters
- Real-time indexing — Add and query new vectors without rebuilding the entire index
- Scalability — Handle billions of vectors with sub-second query times
Popular Vector Databases
The ecosystem has exploded since 2023. Leading options include Pinecone (managed cloud), Weaviate (open source, hybrid search), Qdrant (Rust-based, performant), Milvus (CNCF project), and pgvector (PostgreSQL extension for teams already on Postgres).
The Blue Note Logic Perspective
Vector databases are the retrieval backbone of every CorpusAI deployment. We typically recommend pgvector for teams that want to keep their stack simple and are under 10M vectors, and Qdrant or Weaviate for larger-scale production deployments. The choice of embedding model often matters more than the database—we benchmark multiple models against each client's domain vocabulary before committing to an architecture.