How it works
Diversity filtering over-fetches candidates from the vector database, then applies post-processing to select diverse results.Pipeline architecture
- Query embedding - Convert query text to dense vector
- Over-fetch - Retrieve 3x top_k candidates from database
- Re-embedding - Generate embeddings for retrieved documents
- Diversity filtering - Apply MMR or clustering method
- Limit - Return top_k diverse documents
- Optional RAG - Generate answer using diverse documents
Why diversity matters
Standard semantic search returns the k most similar documents, which often results in redundant information (e.g., 5 similar paragraphs from the same source). Diversity filtering ensures results cover different perspectives, sources, or aspects of the query topic.Diversity methods
MMR (Maximal Marginal Relevance)
MMR - Default method
MMR - Default method
How it works: Balances query relevance with inter-document diversity using lambda parameter.Formula:
MMR(d) = λ × sim(d, query) - (1-λ) × max_sim(d, selected)Configuration:max_documents- Maximum documents to returnlambda_param- Relevance-diversity trade-off (default: 0.5)
Clustering-based
Clustering - Topic coverage method
Clustering - Topic coverage method
How it works: Groups retrieved documents into N clusters using embeddings, then samples M documents from each cluster.Configuration:
num_clusters- Number of topic clusters (default: 3)samples_per_cluster- Docs per cluster (default: 2)
Key features
- Two diversity methods: MMR and clustering-based selection
- Over-fetching with configurable multiplier (default 3x)
- Re-embedding ensures consistent similarity calculations
- Works with all vector databases
- Optional RAG integration
Implementation
Configuration
Required settings
Vector database API authentication
Target index name for search
Diversity configuration
Diversity method:
"mmr" or "clustering"Over-fetch multiplier (retrieves top_k × multiplier candidates)
MMR-specific
Maximum documents to return for MMR method
Relevance-diversity trade-off (0.0-1.0)
- 1.0 = pure relevance
- 0.5 = balanced
- 0.0 = pure diversity
Clustering-specific
Number of clusters for clustering method
Documents to sample from each cluster
Example configurations
Search parameters
Search query text to embed and match against documents
Number of diverse documents to return. Pipeline retrieves 3x this amount for diversity selection.
Optional metadata filters to apply during retrieval
Use cases
Exploratory search
When users need to see different perspectives:Multi-document summarization
Provide diverse context to LLMs:News aggregation
Show articles from different sources:Research literature review
Cover different research approaches:Method comparison
| Aspect | MMR | Clustering |
|---|---|---|
| Query awareness | Yes, uses query similarity | No, only inter-doc similarity |
| Speed | Fast (greedy) | Moderate (K-means) |
| Parameters | lambda_param, max_documents | num_clusters, samples_per_cluster |
| Best for | Relevance + diversity balance | Topic coverage |
| Deterministic | Yes | No (K-means random init) |
| Interpretability | High (clear trade-off) | Medium (cluster interpretation) |
Choosing a method
Default: Use MMR
MMR is query-aware and provides explicit relevance-diversity control. Start with
lambda_param=0.5.Topic coverage: Use clustering
When you need guaranteed coverage of N distinct topics, use clustering with
num_clusters=N.Tune parameters
- MMR: Adjust lambda (↑ relevance, ↓ diversity)
- Clustering: Adjust num_clusters and samples_per_cluster
Diversity helpers
The diversity filtering pipeline uses helper methods that can be used independently:Over-fetching strategy
Example:- Higher multiplier - More diversity options, higher latency
- Lower multiplier - Faster, but limited diversity options
Performance considerations
Time complexity
- MMR: O(k × n) where k=top_k, n=candidates
- Clustering: O(n × d × iterations) for K-means
Optimization tips
- Cache embeddings - Store document embeddings to avoid recomputation
- Limit over-fetch - Balance diversity quality with latency (3-5x multiplier)
- Use metadata filters - Reduce candidate pool before diversity filtering
- Batch processing - Process multiple queries together for efficiency
Evaluation metrics
Measure diversity effectiveness:Average pairwise similarity
Topic coverage
Count distinct topics/sources in results:Related features
MMR
Maximal marginal relevance algorithm details
Semantic search
Initial retrieval before diversity filtering
Hybrid search
Dense + sparse retrieval
Reranking
Cross-encoder second-stage scoring