Skip to main content
Hybrid search runs both dense and sparse retrieval in parallel, then fuses results using Reciprocal Rank Fusion (RRF) or weighted combination. You get the best of both worlds: semantic understanding for concepts plus keyword precision for specific terms.

How it works

Hybrid search combines dense semantic embeddings with sparse lexical embeddings for enhanced retrieval quality.

Search process

  1. Dual embedding - Query is embedded using both dense and sparse embedders
  2. Parallel retrieval - Dense embedding captures semantic query intent, sparse embedding captures exact term matches
  3. Score fusion - Results are fused using RRF or weighted scoring: alpha * dense_score + (1-alpha) * sparse_score
  4. Ranked results - Documents are ranked by the fused score

Fusion mechanisms

Reciprocal Rank Fusion (RRF) RRF combines rankings from multiple sources using the formula:
score(d) = Σ 1/(k + rank_i(d))
Where k is a constant (typically 60) and rank_i is the rank from source i. RRF is robust to score miscalibration between sources since it uses ranks rather than raw scores. Alpha weighting For databases with native hybrid support (like Pinecone), alpha controls the balance between dense and sparse:
  • alpha = 1.0 - Pure dense/semantic search
  • alpha = 0.0 - Pure sparse/keyword search
  • alpha = 0.5 - Balanced hybrid (default)

Key features

  • Dense + sparse fusion with configurable weights
  • RRF handles score normalization automatically
  • Built-in evaluation metrics: Recall@k, MRR, NDCG, Precision@k
  • No FastEmbed dependency—uses native SentenceTransformers sparse encoders

Implementation

from vectordb.haystack.hybrid_indexing import MilvusHybridSearchPipeline

pipeline = MilvusHybridSearchPipeline(
    "src/vectordb/haystack/hybrid_indexing/configs/milvus_triviaqa.yaml"
)
result = pipeline.run(query="machine learning algorithms", top_k=10)

for doc in result["documents"]:
    print(f"Score: {doc.score:.3f} | {doc.content[:100]}...")

Configuration

Required settings

pinecone.api_key
string
required
Pinecone API authentication key
pinecone.index_name
string
required
Target index name for hybrid search

Optional settings

pinecone.alpha
float
Fusion weight (0.0=sparse only, 1.0=dense only, 0.5=balanced hybrid)
pinecone.namespace
string
Namespace within the index for document isolation
embedder
object
Dense embedder configuration for semantic vector generation
sparse
object
Sparse embedder configuration for lexical vectors

Example configuration

pinecone:
  api_key: "${PINECONE_API_KEY}"
  index_name: "hybrid-search"
  namespace: "default"
  alpha: 0.7  # Favor semantic over keyword

embeddings:
  model: "sentence-transformers/all-MiniLM-L6-v2"

sparse:
  enabled: true

Search parameters

query
string
required
Search query text to embed with both dense and sparse embedders
top_k
integer
default:10
Maximum number of results to return
filters
dict
Optional metadata filters for pre-filtering candidates
Hybrid search excels when you need both: Semantic understanding (dense)
  • Understanding query intent
  • Matching synonyms and paraphrases
  • Conceptual similarity
Exact term matching (sparse)
  • Product SKUs or model numbers
  • Technical specifications
  • Legal or medical terminology
  • Proper nouns and acronyms

Example scenarios

# Technical query benefiting from hybrid approach
results = pipeline.search(
    query="GDPR compliance requirements for ML models",
    top_k=10,
)
# Dense: understands "compliance" and "requirements"
# Sparse: exact match on "GDPR" and "ML"

# Product search with specific identifiers
results = pipeline.search(
    query="lightweight running shoes model XR-2024",
    top_k=5,
)
# Dense: understands "lightweight running shoes"
# Sparse: exact match on "XR-2024"

Sparse vector format

Sparse embeddings use SPLADE models to create sparse vectors that emphasize specific terms, similar to traditional BM25 but with learned term importance. Sparse embeddings are represented as dictionaries mapping token indices to importance weights:
{
  "indices": [42, 156, 892, 1024],
  "values": [0.85, 0.62, 0.43, 0.31]
}

Database-specific implementations

Pinecone

Pinecone requires separate sparse_values field for sparse embeddings, distinct from the standard values field used for dense vectors. Native fusion with alpha parameter.

Weaviate

Weaviate uses native BM25 without external embeddings for the sparse component. Hybrid search with configurable fusion weights.

Milvus

Milvus supports hybrid search with partition-based isolation and configurable fusion strategies.

Qdrant

Qdrant uses payload-based filtering with optimized indexes for hybrid retrieval.

Chroma

Chroma provides flexible hybrid search with tenant and database scoping.

Fusion strategy comparison

StrategyBest forParameters
RRFDefault choice, score normalizationk (default: 60)
Alpha weightingKnown source reliabilityalpha (0.0-1.0)
Weighted mergePrior knowledge of source qualityweights per source

Performance tips

Start with alpha=0.5 (balanced) and tune based on your evaluation metrics. For semantic-heavy queries, increase to 0.7-0.8. For keyword-heavy queries, decrease to 0.3-0.4.
  • Over-fetch candidates (2-3x top_k) before fusion for better quality
  • Use metadata filters to reduce search space before hybrid scoring
  • Cache sparse embeddings for repeated queries to reduce latency
  • Monitor RRF k parameter impact on result diversity

Semantic search

Dense vector similarity search

Sparse search

Keyword/lexical matching with SPLADE/BM25

Reranking

Cross-encoder second-stage scoring

Diversity filtering

Post-retrieval redundancy reduction

Build docs developers (and LLMs) love