Skip to main content
MMR (Maximal Marginal Relevance) balances relevance with diversity by penalizing documents too similar to those already selected. The result set covers more aspects of a topic instead of repeating similar content.

How it works

MMR iteratively selects documents by balancing query relevance against redundancy with already-selected documents.

MMR algorithm

MMR iteratively selects documents by scoring each candidate with:
MMR(d) = λ × sim(d, query) - (1-λ) × max_sim(d, selected)
Where:
  • λ (lambda_param) - Trade-off between relevance and diversity (0.0-1.0)
  • sim(d, query) - Cosine similarity between document and query embeddings
  • max_sim(d, selected) - Maximum similarity to any already-selected document

Selection process

  1. First document - Select most relevant to query
  2. Iterative selection - For remaining slots:
    • Calculate relevance to query for each candidate
    • Calculate redundancy (max similarity to selected docs)
    • Compute MMR score with lambda weighting
    • Select document with highest MMR score
  3. Repeat - Until k documents selected

Lambda parameter guidelines

lambda_param
float
Controls the relevance-diversity trade-off:
  • λ = 1.0 - Pure relevance ranking (no diversity penalty)
  • λ = 0.7-0.8 - Emphasize relevance, mild diversity (recommended for precision)
  • λ = 0.5 - Balanced relevance and diversity (good default)
  • λ = 0.3-0.4 - Emphasize diversity (recommended for exploratory search)
  • λ = 0.0 - Pure diversity (minimum redundancy, ignores relevance)

Key features

  • Tune relevance vs diversity to fit the task
  • Uses cosine similarity for both relevance and diversity scoring
  • Particularly useful for summarization and exploratory search
  • Greedy algorithm ensures efficiency

Implementation

from vectordb.langchain.utils import MMRHelper

# Generate embeddings for documents and query
doc_embeddings = embedder.embed_documents([doc.page_content for doc in documents])
query_embedding = embedder.embed_query(query)

# Apply MMR reranking
reranked = MMRHelper.mmr_rerank(
    documents=documents,
    embeddings=doc_embeddings,
    query_embedding=query_embedding,
    lambda_param=0.5,
    k=10,
)

# Returns list of (Document, MMR_score) tuples
for doc, score in reranked:
    print(f"MMR Score: {score:.3f} - {doc.page_content[:100]}")

Use cases

When users need to understand different aspects of a topic:
# Lower lambda for more diversity
results = MMRHelper.mmr_rerank(
    documents=candidates,
    embeddings=embeddings,
    query_embedding=query_emb,
    lambda_param=0.3,  # Emphasize diversity
    k=10,
)

Multi-document summarization

Provide diverse context to LLMs:
# Balanced approach
diverse_docs = MMRHelper.mmr_rerank_simple(
    documents=retrieved_docs,
    embeddings=doc_embeddings,
    query_embedding=query_embedding,
    k=5,
    lambda_param=0.5,
)

# Use diverse docs for summarization
summary = llm.summarize(diverse_docs)

Reducing near-duplicates

When search returns many similar results:
# High diversity to remove redundancy
unique_results = MMRHelper.mmr_rerank_simple(
    documents=search_results,
    embeddings=result_embeddings,
    query_embedding=query_emb,
    k=10,
    lambda_param=0.4,  # Favor diversity
)
When relevance is critical:
# High lambda for relevance focus
precise_results = MMRHelper.mmr_rerank(
    documents=candidates,
    embeddings=embeddings,
    query_embedding=query_emb,
    lambda_param=0.8,  # Emphasize relevance
    k=5,
)

Lambda parameter tuning

Task-specific recommendations

Task typeRecommended λRationale
Q&A systems0.7-0.8Prioritize relevant answers
Exploratory search0.3-0.4Show diverse perspectives
Summarization0.4-0.6Balance coverage and relevance
Deduplication0.2-0.4Maximize uniqueness
Fact verification0.6-0.7Relevant but diverse sources

Tuning guidelines

1

Start with default

Begin with lambda_param=0.5 (balanced)
2

Evaluate results

Check for redundancy or missing relevant docs
3

Adjust based on metrics

  • Too much redundancy? Decrease lambda (more diversity)
  • Missing relevant results? Increase lambda (more relevance)
4

A/B test

Compare user engagement across lambda values

Example with full pipeline

from vectordb.langchain.semantic_search import PineconeSemanticSearchPipeline
from vectordb.langchain.utils import MMRHelper, EmbedderHelper

# Initial retrieval
pipeline = PineconeSemanticSearchPipeline("config.yaml")
candidates = pipeline.search(
    query="climate change mitigation strategies",
    top_k=50,  # Over-fetch for MMR
)

# Generate embeddings
doc_texts = [doc.page_content for doc in candidates["documents"]]
doc_embeddings = embedder.embed_documents(doc_texts)
query_embedding = embedder.embed_query(candidates["query"])

# Apply MMR for diversity
diverse_results = MMRHelper.mmr_rerank_simple(
    documents=candidates["documents"],
    embeddings=doc_embeddings,
    query_embedding=query_embedding,
    k=10,
    lambda_param=0.5,
)

print(f"Retrieved {len(diverse_results)} diverse documents")
for i, doc in enumerate(diverse_results, 1):
    print(f"{i}. {doc.page_content[:100]}...")

Performance characteristics

Time complexity

  • First selection: O(n) to find most relevant
  • Subsequent selections: O((k-1) × n) for k selections from n candidates
  • Overall: O(k × n)
For typical values (k=10, n=100), this is very fast (~ms).

Space complexity

  • O(n × d) for storing embeddings (n docs, d dimensions)
  • Cosine similarity computed on-demand

Optimization tips

Pre-compute and cache embeddings for retrieved documents to avoid repeated embedding calls. Only re-embed when document content changes.

Comparison with other diversity methods

MethodApproachSpeedUse case
MMRQuery-aware greedy selectionFastGeneral diversity with relevance
ClusteringK-means + samplingModerateTopic coverage
Threshold filteringSimilarity cutoffFastestSimple deduplication
Graph-basedCommunity detectionSlowComplex relationships

Integration with diversity filtering

MMR is one of the diversity methods available in the diversity filtering pipeline:
from vectordb.langchain.diversity_filtering import PineconeDiversityFilteringSearchPipeline

pipeline = PineconeDiversityFilteringSearchPipeline({
    "pinecone": {"api_key": "...", "index_name": "..."},
    "diversity": {
        "method": "mmr",
        "lambda_param": 0.5,
        "max_documents": 10,
        "candidate_multiplier": 3,
    },
})

results = pipeline.search(query="machine learning", top_k=10)
See Diversity filtering for more details.

Diversity filtering

Complete diversity pipeline with MMR and clustering

Semantic search

Initial retrieval before MMR

Reranking

Cross-encoder scoring alternative

Contextual compression

Reduce retrieved context

Build docs developers (and LLMs) love