How it works
Semantic search uses dense vector embeddings to find documents with similar meaning to the query, enabling conceptual matching beyond keyword overlap.Search process
- Query embedding - Convert query text to dense vector using embedding model
- Vector search - Find nearest neighbors in vector database index
- Result retrieval - Return top-k most similar documents
- Optional RAG - Generate answer using retrieved documents as context
When to use semantic search
Traditional keyword search fails when:- Query uses synonyms (“automobile” vs “car”)
- Documents use different terminology for same concepts
- User doesn’t know exact terms used in documents
Key features
- Supports any SentenceTransformers model
- Optional semantic diversification removes near-duplicate results
- Integrates with Groq or OpenAI for RAG answer generation
- Metadata filters narrow results by category, date, source, or custom fields
Implementation
Configuration
Required settings
Pinecone API authentication key
Target index name for search
Optional settings
Document organization namespace for logical partitioning
Embedding model configuration (must match indexing model)
Optional LLM configuration for answer generation
Example configuration
Search parameters
Search query text to embed and match against documents
Number of results to return
Metadata filters for pre-filtering (e.g.,
{"category": "tech"})Isolated document collection within index
Embedding consistency
Always ensure:- Query embedder matches document embedder
- Same model version is used
- Consistent preprocessing (normalization, truncation)
RAG integration
If an LLM is configured, the pipeline can generate answers using retrieved documents as context. This combines retrieval accuracy with generation fluency for question-answering applications.Database support
Semantic search is available across all supported vector databases:- Pinecone - Managed vector database with namespaces
- Weaviate - Open-source vector search with collections
- Qdrant - High-performance search with payload filtering
- Milvus - Scalable vector database with partition-key isolation
- Chroma - Lightweight vector store for local development
Related features
Hybrid search
Combine dense and sparse retrieval with fusion
Reranking
Cross-encoder second-stage scoring
Diversity filtering
Post-retrieval redundancy reduction
MMR
Maximal marginal relevance for diversity