Hybrid retrieval combines dense semantic embeddings with sparse lexical embeddings to improve robustness across both natural-language queries and keyword-precise queries. When one signal is weak, the other compensates.
How it works
Dual indexing
Each document is embedded twice — once with a dense SentenceTransformers model (SentenceTransformersDocumentEmbedder) to produce a float vector capturing semantic meaning, and once with a sparse SentenceTransformers model (SentenceTransformersSparseDocumentEmbedder, typically a SPLADE model) to produce a token-weight sparse vector capturing lexical features.
Dual retrieval
At query time, the query is embedded with both the dense text embedder and the sparse text embedder to produce two query representations.
Score fusion
Results from the dense retriever and the sparse retriever are merged using ResultMerger (from utils/fusion.py). The default strategy is Reciprocal Rank Fusion (RRF), which combines rankings without requiring score normalization.
Final ranking
The fused, deduplicated result list is returned as the top-k documents.
score(d) = Σ 1 / (k + rank_i)
Where the sum is over all retrieval sources and k (default 60) smooths rank differences.
Pinecone hybrid indexing example
src/vectordb/haystack/hybrid_indexing/indexing/pinecone.py
from vectordb.databases.pinecone import PineconeVectorDB
from vectordb.dataloaders import DataloaderCatalog
from vectordb.haystack.utils import ConfigLoader, EmbedderFactory
class PineconeHybridIndexingPipeline :
"""Pinecone hybrid (dense + sparse) indexing pipeline."""
def __init__ ( self , config_or_path : dict[ str , Any] | str ) -> None :
"""Initialize indexing pipeline from configuration."""
self .config = ConfigLoader.load(config_or_path)
ConfigLoader.validate( self .config, "pinecone" )
# Create both dense and sparse embedders
self .dense_embedder = EmbedderFactory.create_document_embedder( self .config)
self .sparse_embedder = None
if "sparse" in self .config:
self .sparse_embedder = EmbedderFactory.create_sparse_document_embedder(
self .config
)
pinecone_config = self .config[ "pinecone" ]
self .db = PineconeVectorDB(
api_key = pinecone_config.get( "api_key" ),
index_name = pinecone_config.get( "index_name" ),
host = pinecone_config.get( "host" ),
)
self .index_name = pinecone_config.get( "index_name" )
self .namespace = pinecone_config.get( "namespace" , "default" )
def _embed_documents ( self , documents : list[Document]) -> list[Document]:
"""Generate dense and sparse embeddings for documents."""
# Dense embeddings
dense_result = self .dense_embedder.run( documents = documents)
embedded_docs = dense_result[ "documents" ]
# Sparse embeddings (if configured)
if self .sparse_embedder:
sparse_result = self .sparse_embedder.run( documents = embedded_docs)
embedded_docs = sparse_result[ "documents" ]
return embedded_docs
def run ( self ) -> dict[ str , Any]:
"""Execute the complete indexing pipeline."""
# Load documents
dl_config = self .config.get( "dataloader" , {})
loader = DataloaderCatalog.create(
dl_config.get( "type" , "triviaqa" ),
split = dl_config.get( "split" , "test" ),
limit = dl_config.get( "limit" ),
)
dataset = loader.load()
documents = dataset.to_haystack()
# Embed with both dense and sparse models
embedded_docs = self ._embed_documents(documents)
# Upsert to Pinecone
self .db.upsert(
documents = embedded_docs,
index_name = self .index_name,
namespace = self .namespace,
)
return {
"documents_indexed" : len (embedded_docs),
"db" : "pinecone" ,
"index_name" : self .index_name,
}
Pinecone hybrid search example
src/vectordb/haystack/hybrid_indexing/search/pinecone.py
from vectordb.databases.pinecone import PineconeVectorDB
from vectordb.haystack.utils import ConfigLoader, EmbedderFactory
class PineconeHybridSearchPipeline :
"""Pinecone hybrid (dense + sparse) search pipeline.
Uses Pinecone's native sparse_vector support with alpha weighting:
final_score = alpha * dense_score + (1 - alpha) * sparse_score
"""
def __init__ ( self , config_or_path : dict[ str , Any] | str ) -> None :
"""Initialize search pipeline from configuration."""
self .config = ConfigLoader.load(config_or_path)
ConfigLoader.validate( self .config, "pinecone" )
# Initialize query embedders
self .dense_embedder = EmbedderFactory.create_text_embedder( self .config)
self .sparse_embedder = None
if "sparse" in self .config:
self .sparse_embedder = EmbedderFactory.create_sparse_text_embedder(
self .config
)
pinecone_config = self .config[ "pinecone" ]
self .db = PineconeVectorDB(
api_key = pinecone_config.get( "api_key" ),
index_name = pinecone_config.get( "index_name" ),
host = pinecone_config.get( "host" ),
)
self .index_name = pinecone_config.get( "index_name" )
self .namespace = pinecone_config.get( "namespace" , "default" )
self .alpha = pinecone_config.get( "alpha" , 0.5 ) # 0.5 = equal weighting
def _embed_query ( self , query : str ) -> tuple[list[ float ], Any | None ]:
"""Embed query with dense and sparse embedders."""
# Dense embedding
dense_result = self .dense_embedder.run( text = query)
dense_embedding = dense_result.get( "embedding" )
# Sparse embedding (if configured)
sparse_embedding = None
if self .sparse_embedder:
sparse_result = self .sparse_embedder.run( text = query)
sparse_embedding = sparse_result.get( "sparse_embedding" )
return dense_embedding, sparse_embedding
def run (
self ,
query : str ,
top_k : int = 10 ,
filters : dict[ str , Any] | None = None ,
) -> dict[ str , Any]:
"""Execute hybrid search query."""
# Generate both embeddings
dense_embedding, sparse_embedding = self ._embed_query(query)
# Execute Pinecone's native hybrid search
documents = self .db.hybrid_search(
query_embedding = dense_embedding,
query_sparse_embedding = sparse_embedding,
index_name = self .index_name,
namespace = self .namespace,
top_k = top_k,
filter = filters,
alpha = self .alpha,
)
return {
"documents" : documents,
"query" : query,
"db" : "pinecone" ,
}
Result fusion with RRF
For backends without native hybrid support, use ResultMerger:
src/vectordb/haystack/components/result_merger.py
from haystack import Document
class ResultMerger :
"""Merge results from multiple retrieval sources."""
@ staticmethod
def rrf_fusion (
dense_docs : list[Document],
sparse_docs : list[Document],
k : int = 60 ,
top_k : int | None = None ,
) -> list[Document]:
"""Reciprocal Rank Fusion.
Args:
dense_docs: Documents from dense retriever (ordered by relevance)
sparse_docs: Documents from sparse retriever (ordered by relevance)
k: RRF parameter (constant added to rank, default 60)
top_k: Return top K documents
Returns:
Fused and reranked documents
"""
rrf_scores: dict[ str , float ] = {}
# Score dense results
for rank, doc in enumerate (dense_docs, 1 ):
doc_id = ResultMerger.stable_doc_id(doc)
rrf_scores[doc_id] = rrf_scores.get(doc_id, 0 ) + 1 / (k + rank)
# Score sparse results
for rank, doc in enumerate (sparse_docs, 1 ):
doc_id = ResultMerger.stable_doc_id(doc)
rrf_scores[doc_id] = rrf_scores.get(doc_id, 0 ) + 1 / (k + rank)
# Build document map for deduplication
doc_map = {}
for doc in dense_docs + sparse_docs:
doc_id = ResultMerger.stable_doc_id(doc)
if doc_id not in doc_map:
doc_map[doc_id] = doc
# Sort by RRF score
sorted_docs = [
doc_map[doc_id]
for doc_id in sorted (
rrf_scores.keys(), key = lambda x : rrf_scores[x], reverse = True
)
if doc_id in doc_map
]
if top_k is None :
top_k = max ( len (dense_docs), len (sparse_docs))
return sorted_docs[:top_k]
@ staticmethod
def weighted_fusion (
dense_docs : list[Document],
sparse_docs : list[Document],
dense_weight : float = 0.7 ,
sparse_weight : float = 0.3 ,
top_k : int | None = None ,
) -> list[Document]:
"""Weighted sum fusion with score normalization."""
# Implementation handles score normalization and weighted combination
# See full implementation in result_merger.py
pass
Configuration
pinecone :
api_key : "${PINECONE_API_KEY}"
index_name : "hybrid-search"
alpha : 0.5 # 0.5 = equal dense/sparse weighting
embeddings :
model : "sentence-transformers/all-MiniLM-L6-v2"
device : "cpu"
batch_size : 32
sparse :
model : "naver/splade-cocondenser-ensembledistil"
fusion :
strategy : "rrf" # or "weighted"
dense_weight : 0.7 # Only for weighted fusion
sparse_weight : 0.3
dataloader :
dataset : "triviaqa"
limit : 500
search :
top_k : 10
When to use it
Corpora with mixed query styles: some users ask in natural language, others search with domain keywords or acronyms
Enterprise knowledge bases where exact product names, codes, or identifiers matter alongside conceptual questions
Any workload where pure semantic search misses highly relevant documents that contain exact query terms
When not to use it
Small datasets where the added complexity of dual indexing and fusion has negligible quality impact
Prototypes or early experiments where you have not yet validated whether the semantic baseline falls short
Settings to tune first
Setting Why it matters fusion.strategy"rrf" requires no tuning; "weighted" lets you favor dense or sparse signalfusion.dense_weight / fusion.sparse_weightOnly for weighted fusion; start at 0.7/0.3 and adjust based on query type distribution sparse.modelSPLADE model quality directly affects lexical matching behavior search.top_kFinal merged result count; set larger than semantic-only top_k to preserve fusion coverage
Common pitfalls
Unbalanced fusion : Setting one weight to near-zero effectively reverts to single-signal retrieval. Measure both retrieval paths independently before fusing.
Missing sparse vectors at query time : If the indexing config uses sparse embeddings but the search config does not, the sparse retrieval path returns nothing. Keep configs consistent.
Not validating per-query-class behavior : Hybrid usually helps keyword queries most and natural-language queries least. If your evaluation set is exclusively natural-language questions, the improvement over semantic search may be small.
Supported backends
Chroma, Milvus, Pinecone, Qdrant, Weaviate.
Dataset configs provided
ARC, Earnings Calls, FActScore, PopQA, TriviaQA.
Next steps
Components Add reranking after fusion for further precision improvement
Pipelines Learn about advanced pipeline composition patterns