What is a Vector Store?
A vector store is a specialized database that stores text as high-dimensional numerical vectors (embeddings) and enables fast similarity search. Instead of keyword matching, vector stores find semantically similar content based on meaning.Vector embeddings capture the semantic meaning of text. Similar concepts have similar vectors, even if they use different words.
Why Vector Embeddings?
Traditional keyword search fails when questions and answers use different terminology:Keyword Search
Query: “revenue growth” → Only finds exact phrase “revenue growth”
Vector Search
Query: “revenue growth” → Finds “sales increase”, “income expansion”, etc.
ChromaDB in RAG Chat
RAG Chat uses ChromaDB as its vector store. ChromaDB is lightweight, fast, and perfect for local deployments.Loading an Existing Vector Store
When the app starts, it checks for previously stored documents:app.py
The
db directory stores all your document embeddings persistently. This means your documents remain available even after restarting the application.Creating a New Vector Store
When you upload your first document, ChromaDB creates a new vector store:app.py
- New store: Creates a fresh ChromaDB instance with
from_documents() - Existing store: Adds new documents to the existing collection with
add_documents()
The Embedding Process
How Text Becomes Vectors
How Text Becomes Vectors
- Text chunk: “RAG combines retrieval with generation”
- OpenAI Embedding API: Converts text to a 1536-dimensional vector
- Vector:
[0.023, -0.145, 0.891, ..., 0.234](1536 numbers) - Storage: ChromaDB stores the vector along with the original text
- Retrieval: Query vectors are compared to stored vectors using cosine similarity
OpenAI Embeddings
RAG Chat uses OpenAI’stext-embedding-ada-002 model through OpenAIEmbeddings():
app.py
The same embedding model must be used for both storing and querying to ensure vectors are in the same semantic space.
Similarity Search
When you ask a question, ChromaDB performs similarity search:app.py
- Embeds your question using OpenAI’s embedding model
- Computes similarity between the question vector and all stored chunk vectors
- Returns the top-k most similar chunks (default k=4)
- These chunks become the context for the LLM
Cosine Similarity
ChromaDB uses cosine similarity to measure how close two vectors are:- 1.0: Identical meaning
- 0.8-0.9: Very similar
- 0.5-0.7: Somewhat related
- < 0.5: Not very similar
Persistence
Thedb directory contains all your vector store data:
Your uploaded documents persist across sessions. Delete the
db directory to reset the vector store.Advantages of Persistence
Fast Startup
No need to re-process documents every time
Incremental Updates
Add new documents without losing existing ones
Cost Savings
Avoid redundant embedding API calls
Reliability
Data survives application restarts
Code Flow Example
Here’s how documents flow into the vector store:Performance Considerations
Embedding Costs
OpenAI charges per token for embeddings:- Model:
text-embedding-ada-002 - Cost: ~$0.0001 per 1K tokens
- Example: A 100-page document might cost 0.15 to embed
Embeddings are only computed once per document chunk. Queries use the same model but only embed the question (very cheap).
Retrieval Speed
ChromaDB is optimized for fast retrieval:- Small collections (< 10K chunks): Near-instant retrieval
- Medium collections (10K-100K chunks): Milliseconds
- Large collections (> 100K chunks): Consider approximate nearest neighbor (ANN) indices
Next Steps
Document Processing
Learn how documents are split into chunks before embedding
RAG Overview
Understand the complete RAG pipeline