Overview
The vector search pipeline consists of three main stages:Embedding Generation
Query text converted to 768-dimensional vector using BGE-base-en-v1.5
Vector Search
Pinecone finds top 8 semantically similar verses using cosine similarity
Hybrid Ranking
Semantic scores combined with keyword matching for optimal relevance
BGE Embedding Model
GitaChat uses BAAI/bge-base-en-v1.5, one of the top-performing embedding models on the MTEB benchmark.Model Configuration
Model Specifications
Model ID:
BAAI/bge-base-en-v1.5Key Properties:- Dimensions: 768 (optimal balance of accuracy and performance)
- Max Tokens: 512 tokens
- Architecture: BERT-based encoder
- Training: Contrastive learning on diverse text pairs
backend/config.py:32-33):Model Initialization
The embedding model is loaded once on application startup to avoid cold starts.Client Setup (Startup Warmup (
backend/clients.py:23-24):backend/main.py:68-73):Instruction Prefix
BGE models perform best with query-specific instruction prefixes.Query Encoding (Why It Matters:
backend/model.py:38-42):- Improves retrieval accuracy by 5-10%
- Aligns query representation with passage representation
- Recommended by BGE authors for asymmetric search
Pinecone Integration
Pinecone provides the vector database infrastructure for fast, scalable similarity search.Index Configuration
Vector Storage
Index Structure:
- Namespace: Default (single namespace for all verses)
- Dimensions: 768 (matching BGE output)
- Metric: Cosine similarity
- Records: ~700 verses (all 18 chapters)
backend/clients.py:16-18):Vector Metadata
Each vector includes rich metadata for filtering and display.Metadata Schema:Metadata Benefits:
- Enables exact verse retrieval by chapter/verse
- Includes full text for keyword matching
- Pre-computed summaries reduce AI calls
- No need for separate document store
Vector Upload Process
Batch Upsert
Verses are uploaded to Pinecone in batches for efficiency.Batch Upload Function (Batch Size: 100 vectors per batch (
backend/utils.py:95-99):config.py:37)Semantic Search Algorithm
The core search functionality combines vector similarity with intelligent re-ranking.Query Processing
Match Function
The primary search function that orchestrates the entire pipeline.Function Signature (Complete Implementation (
backend/model.py:36):backend/model.py:36-119):Hybrid Ranking Algorithm
GitaChat uses a sophisticated hybrid approach that combines semantic and keyword signals.Step 1: Semantic Search
Vector Similarity: Pinecone returns top 8 verses by cosine similarityQuery Parameters (Semantic Score Range: 0.0 to 1.0 (cosine similarity)
backend/model.py:45-47):- 0.8+: Highly relevant match
- 0.6-0.8: Good semantic alignment
- Below 0.6: Weak semantic match
Step 2: Keyword Boost Calculation
Exact Term Matching: Boost verses containing query keywordsImplementation (Keyword Boost Range: 0.0 to 1.0
backend/model.py:69-79):- 1.0: All query terms present in verse
- 0.5: Half of query terms found
- 0.0: No keyword matches
Step 3: Score Combination
Weighted Combination: Semantic score + 15% keyword boostFormula (Why 15% Weight?:
backend/model.py:81-84):- Semantic search remains primary signal (85%)
- Keyword matching breaks ties and boosts exact matches
- Prevents keyword over-optimization
- Balances conceptual and lexical matching
Step 4: Final Ranking
Sort by Combined Score: Re-order results by hybrid scoreImplementation (Result Selection:
backend/model.py:86-87):- Best Match: Highest combined score becomes primary result
- Related Verses: Next 3 unique verses shown as alternatives
- Deduplication: Same verse never appears twice
Exact Verse Retrieval
For direct verse access (e.g.,/verse/2/47), GitaChat uses metadata filtering instead of semantic search.
Get Verse Function
Fast Lookup: Retrieve specific verse by chapter and numberImplementation (Key Features:
backend/model.py:10-33):- Dummy Vector: Uses zero vector since filtering by metadata
- Metadata Filter: Exact match on chapter and verse
- Fast: No embedding computation needed
- Cached Commentary: Returns pre-computed summary
Performance Characteristics
Latency Breakdown
Typical Query Times:
- Embedding generation: 20-50ms
- Pinecone query: 30-80ms
- Keyword processing: 5-10ms
- Total search time: 60-150ms
- Model preloading eliminates cold starts
- Pinecone indexes enable sub-100ms search
- Async processing prevents blocking
Accuracy Metrics
Observed Performance:
- Top-1 Accuracy: ~85% (best match relevant)
- Top-3 Accuracy: ~95% (related verse relevant)
- Keyword Boost Impact: +8% accuracy improvement
- BGE model trained on diverse datasets
- Instruction prefix improves retrieval
- Hybrid ranking reduces false positives
Scalability
Current Scale:
- 700+ vectors in index
- ~1,000 queries/day
- 99.9% uptime
- Pinecone: Supports millions of vectors
- BGE model: CPU-bound (20-50ms per query)
- Rate limiting: 30 requests/min per IP
Cost Analysis
Per-Query Costs:
- Pinecone query: ~$0.0001
- Embedding computation: Free (self-hosted)
- OpenAI commentary: $0.002-0.004 (separate)
- Pinecone: ~$3-5/month
- Compute (Railway): ~$10-20/month
Advanced Use Cases
Batch Verse Loading
All Verses Endpoint: Load entire corpus for client-side searchImplementation (Use Case: Powers frontend full-text search and browse features
Caching: Loaded once on startup, stored in
backend/main.py:24-56):all_verses_cacheMulti-Vector Search
Future Enhancement: Search across multiple embedding modelsPotential Implementation:
Troubleshooting
Poor Search Results
Symptoms: Irrelevant verses returnedDebugging Steps:
- Check embedding model loaded correctly
- Verify instruction prefix applied
- Inspect semantic scores (below 0.5 = poor match)
- Test keyword boost calculation
- Review Pinecone index metadata
Slow Query Performance
Symptoms: Over 500ms query latencyCommon Causes:
- Model not preloaded (cold start)
- Pinecone connection timeout
- High Pinecone region latency
- CPU thread contention
- Verify model warmup in startup logs
- Check Pinecone region matches deployment
- Review CPU thread configuration
Empty Results
Symptoms: No matches foundPossible Issues:
- Pinecone index empty
- Metadata filter too restrictive
- Query embedding failed
- Dimension mismatch (768)
Incorrect Verse Retrieved
Symptoms: Wrong chapter/verse returnedVerification:Common Fix: Re-upload vectors with correct metadata