Hybrid retrieval combines dense semantic embeddings with sparse lexical embeddings to improve robustness across both natural-language and keyword-precise queries.
How it works
Dual indexing
Each document is embedded twice — once with HuggingFaceEmbeddings for the dense semantic vector, and once with a sparse embedding model for the token-weight lexical vector. Both vectors are stored in the backend.
Dual retrieval
At query time, the same query is embedded with both the dense and sparse models.
Score fusion
Dense and sparse retrieval results are merged using ResultMerger from utils/fusion.py. The default strategy is Reciprocal Rank Fusion (RRF).
Final ranking
The fused, deduplicated list (up to top_k) is returned.
Reciprocal Rank Fusion
ResultMerger.reciprocal_rank_fusion() combines rankings using:
score(d) = Σ 1 / (k + rank)
where k is a constant (typically 60) and rank is the position from each retrieval source. This is robust to different score scales because it operates on ranks, not raw scores.
Pinecone hybrid indexing
Pinecone natively supports hybrid search with both dense and sparse vectors:
src/vectordb/langchain/hybrid_indexing/indexing/pinecone.py
from vectordb.databases.pinecone import PineconeVectorDB
from vectordb.dataloaders import DataloaderCatalog
from vectordb.langchain.utils import ConfigLoader, EmbedderHelper, SparseEmbedder
class PineconeHybridIndexingPipeline :
"""Pinecone hybrid (dense + sparse) indexing pipeline.
Indexes documents with both dense semantic embeddings and sparse lexical
embeddings to enable Pinecone's native hybrid search functionality.
"""
def __init__ ( self , config_or_path : dict[ str , Any] | str ) -> None :
self .config = ConfigLoader.load(config_or_path)
ConfigLoader.validate( self .config, "pinecone" )
# Create both dense and sparse embedders
self .dense_embedder = EmbedderHelper.create_embedder( self .config)
self .sparse_embedder = SparseEmbedder()
pinecone_config = self .config[ "pinecone" ]
self .db = PineconeVectorDB(
api_key = pinecone_config[ "api_key" ],
index_name = pinecone_config.get( "index_name" ),
)
self .index_name = pinecone_config.get( "index_name" )
self .namespace = pinecone_config.get( "namespace" , "" )
self .dimension = pinecone_config.get( "dimension" , 384 )
def run ( self ) -> dict[ str , Any]:
"""Execute hybrid indexing pipeline."""
# Load documents
limit = self .config.get( "dataloader" , {}).get( "limit" )
dl_config = self .config.get( "dataloader" , {})
loader = DataloaderCatalog.create(
dl_config.get( "type" , "triviaqa" ),
split = dl_config.get( "split" , "test" ),
limit = limit,
)
dataset = loader.load()
documents = dataset.to_langchain()
# Generate both dense and sparse embeddings
docs, dense_embeddings = EmbedderHelper.embed_documents(
self .dense_embedder, documents
)
texts = [doc.page_content for doc in documents]
sparse_embeddings = self .sparse_embedder.embed_documents(texts)
# Create index with dense dimension
recreate = self .config.get( "pinecone" , {}).get( "recreate" , False )
self .db.create_index(
index_name = self .index_name,
dimension = self .dimension,
metric = self .config.get( "pinecone" , {}).get( "metric" , "cosine" ),
recreate = recreate,
)
# Prepare upsert data with both dense and sparse vectors
upsert_data = []
for i, (doc, dense_emb, sparse_emb) in enumerate (
zip (docs, dense_embeddings, sparse_embeddings)
):
upsert_data.append(
{
"id" : f " { self .index_name } _ { i } " ,
"values" : dense_emb,
"sparse_values" : sparse_emb,
"metadata" : {
"text" : doc.page_content,
** (doc.metadata or {}),
},
}
)
num_indexed = self .db.upsert(
data = upsert_data,
namespace = self .namespace,
)
return {
"documents_indexed" : num_indexed,
"db" : "pinecone" ,
"index_name" : self .index_name,
}
Result merger for fusion
The ResultMerger provides multiple fusion strategies:
src/vectordb/langchain/utils/fusion.py
from langchain_core.documents import Document
class ResultMerger :
"""Helper for merging and fusing multiple retrieval result sets."""
@ staticmethod
def reciprocal_rank_fusion (
results_list : list[list[Document]],
k : int = 60 ,
weights : list[ float ] | None = None ,
dedup_key : str | None = None ,
) -> list[Document]:
"""Merge results using Reciprocal Rank Fusion (RRF).
Args:
results_list: List of result sets from multiple searches.
k: RRF parameter (default 60).
weights: Optional weights for each result set (default equal weights).
dedup_key: Optional metadata key for deduplication.
Returns:
Merged list of documents sorted by RRF score.
"""
if not results_list:
return []
if weights is None :
weights = [ 1.0 / len (results_list)] * len (results_list)
# Normalize weights
total_weight = sum (weights)
weights = [w / total_weight for w in weights]
# Calculate RRF scores
rrf_scores = {}
doc_map = {}
for result_set, weight in zip (results_list, weights):
for rank, doc in enumerate (result_set, 1 ):
# Use metadata key for uniqueness if provided
if dedup_key:
key = doc.metadata.get(dedup_key)
if key is None :
key = doc.page_content
else :
key = doc.page_content
doc_map[key] = doc
rrf_score = (weight * 1.0 ) / (k + rank)
rrf_scores[key] = rrf_scores.get(key, 0 ) + rrf_score
sorted_keys = sorted (
rrf_scores.keys(), key = lambda x : rrf_scores[x], reverse = True
)
return [doc_map[key] for key in sorted_keys]
@ staticmethod
def weighted_merge (
results_list : list[list[Document]],
weights : list[ float ] | None = None ,
dedup_key : str | None = None ,
) -> list[Document]:
"""Merge results with weighted scoring.
Args:
results_list: List of result sets from multiple searches.
weights: Weights for each result set (default equal weights).
dedup_key: Optional metadata key for deduplication.
Returns:
Merged list of documents sorted by weighted score.
"""
if not results_list:
return []
if weights is None :
weights = [ 1.0 / len (results_list)] * len (results_list)
# Normalize weights
total_weight = sum (weights)
weights = [w / total_weight for w in weights]
# Calculate weighted scores
weighted_scores = {}
doc_map = {}
for result_set, weight in zip (results_list, weights):
for rank, doc in enumerate (result_set):
if dedup_key:
key = doc.metadata.get(dedup_key)
if key is None :
key = doc.page_content
else :
key = doc.page_content
doc_map[key] = doc
# Score decreases with rank
score = weight * max ( 0 , 1.0 - (rank / max ( len (result_set), 1 )))
weighted_scores[key] = weighted_scores.get(key, 0 ) + score
sorted_keys = sorted (
weighted_scores.keys(),
key = lambda x : weighted_scores[x],
reverse = True ,
)
return [doc_map[key] for key in sorted_keys]
Configuration
pinecone :
api_key : "${PINECONE_API_KEY}"
index_name : "hybrid-search"
namespace : "default"
dimension : 384
metric : "cosine"
recreate : false
embeddings :
model : "sentence-transformers/all-MiniLM-L6-v2"
device : "cpu"
sparse :
model : "naver/splade-cocondenser-ensembledistil"
fusion :
strategy : "rrf" # "rrf" or "weighted"
dense_weight : 0.7 # Used only when strategy is "weighted"
sparse_weight : 0.3
search :
top_k : 10
fetch_k : 30 # Candidate pool per retriever before fusion
When to use it
Mixed query styles where some users phrase naturally and others search with domain terms
Enterprise knowledge bases with exact product names, codes, or identifiers alongside conceptual questions
Any workload where pure semantic search misses documents containing exact query terms
When not to use it
Small datasets where dual indexing complexity has negligible quality impact
Prototypes where the semantic baseline has not yet been validated
Backends that do not natively support sparse vectors
Tradeoffs
Dimension What to expect Quality Usually improves recall robustness by covering both semantic and lexical intent Latency Moderate increase from two embedding models and two retrieval paths Cost Higher indexing and query cost from dual embeddings and more complex search
Settings to tune first
"rrf" requires no tuning; "weighted" gives explicit control over dense vs sparse contribution.
SPLADE model quality directly affects lexical matching coverage. Recommended: naver/splade-cocondenser-ensembledistil
Each retriever fetches this many candidates before fusion; larger pools improve fusion quality.
Common pitfalls
Unbalanced fusion : Weight near-zero on either side effectively reverts to single-signal retrieval. Measure both retrieval paths independently first.
Missing sparse model at query time : Ensure both dense and sparse embedding configs are consistent between indexing and search scripts.
Not validating per-query-class behavior : Hybrid helps keyword-heavy queries most. If your evaluation set is all natural-language questions, the improvement over semantic search may be modest.
Backends supported
Chroma, Milvus, Pinecone, Qdrant, Weaviate.
Next steps
Add reranking Add reranking after fusion for further precision improvement
Sparse-only indexing Use sparse indexing alone if keyword precision is the dominant need
Measure improvement Measure against semantic search to quantify the hybrid improvement
Components Explore other reusable LangChain components