Hybrid search implementation

Hybrid retrieval combines dense semantic embeddings with sparse lexical embeddings to improve robustness across both natural-language and keyword-precise queries.

How it works

Dual indexing

Each document is embedded twice — once with HuggingFaceEmbeddings for the dense semantic vector, and once with a sparse embedding model for the token-weight lexical vector. Both vectors are stored in the backend.

Dual retrieval

At query time, the same query is embedded with both the dense and sparse models.

Score fusion

Dense and sparse retrieval results are merged using ResultMerger from utils/fusion.py. The default strategy is Reciprocal Rank Fusion (RRF).

Final ranking

The fused, deduplicated list (up to top_k) is returned.

Reciprocal Rank Fusion

ResultMerger.reciprocal_rank_fusion() combines rankings using:

score(d) = Σ 1 / (k + rank)

where k is a constant (typically 60) and rank is the position from each retrieval source. This is robust to different score scales because it operates on ranks, not raw scores.

Pinecone hybrid indexing

Pinecone natively supports hybrid search with both dense and sparse vectors:

src/vectordb/langchain/hybrid_indexing/indexing/pinecone.py

from vectordb.databases.pinecone import PineconeVectorDB
from vectordb.dataloaders import DataloaderCatalog
from vectordb.langchain.utils import ConfigLoader, EmbedderHelper, SparseEmbedder

class PineconeHybridIndexingPipeline:
    """Pinecone hybrid (dense + sparse) indexing pipeline.
    
    Indexes documents with both dense semantic embeddings and sparse lexical
    embeddings to enable Pinecone's native hybrid search functionality.
    """

    def __init__(self, config_or_path: dict[str, Any] | str) -> None:
        self.config = ConfigLoader.load(config_or_path)
        ConfigLoader.validate(self.config, "pinecone")

        # Create both dense and sparse embedders
        self.dense_embedder = EmbedderHelper.create_embedder(self.config)
        self.sparse_embedder = SparseEmbedder()

        pinecone_config = self.config["pinecone"]
        self.db = PineconeVectorDB(
            api_key=pinecone_config["api_key"],
            index_name=pinecone_config.get("index_name"),
        )

        self.index_name = pinecone_config.get("index_name")
        self.namespace = pinecone_config.get("namespace", "")
        self.dimension = pinecone_config.get("dimension", 384)

    def run(self) -> dict[str, Any]:
        """Execute hybrid indexing pipeline."""
        # Load documents
        limit = self.config.get("dataloader", {}).get("limit")
        dl_config = self.config.get("dataloader", {})
        loader = DataloaderCatalog.create(
            dl_config.get("type", "triviaqa"),
            split=dl_config.get("split", "test"),
            limit=limit,
        )
        dataset = loader.load()
        documents = dataset.to_langchain()

        # Generate both dense and sparse embeddings
        docs, dense_embeddings = EmbedderHelper.embed_documents(
            self.dense_embedder, documents
        )

        texts = [doc.page_content for doc in documents]
        sparse_embeddings = self.sparse_embedder.embed_documents(texts)

        # Create index with dense dimension
        recreate = self.config.get("pinecone", {}).get("recreate", False)
        self.db.create_index(
            index_name=self.index_name,
            dimension=self.dimension,
            metric=self.config.get("pinecone", {}).get("metric", "cosine"),
            recreate=recreate,
        )

        # Prepare upsert data with both dense and sparse vectors
        upsert_data = []
        for i, (doc, dense_emb, sparse_emb) in enumerate(
            zip(docs, dense_embeddings, sparse_embeddings)
        ):
            upsert_data.append(
                {
                    "id": f"{self.index_name}_{i}",
                    "values": dense_emb,
                    "sparse_values": sparse_emb,
                    "metadata": {
                        "text": doc.page_content,
                        **(doc.metadata or {}),
                    },
                }
            )

        num_indexed = self.db.upsert(
            data=upsert_data,
            namespace=self.namespace,
        )

        return {
            "documents_indexed": num_indexed,
            "db": "pinecone",
            "index_name": self.index_name,
        }

Result merger for fusion

The ResultMerger provides multiple fusion strategies:

src/vectordb/langchain/utils/fusion.py

from langchain_core.documents import Document

class ResultMerger:
    """Helper for merging and fusing multiple retrieval result sets."""

    @staticmethod
    def reciprocal_rank_fusion(
        results_list: list[list[Document]],
        k: int = 60,
        weights: list[float] | None = None,
        dedup_key: str | None = None,
    ) -> list[Document]:
        """Merge results using Reciprocal Rank Fusion (RRF).
        
        Args:
            results_list: List of result sets from multiple searches.
            k: RRF parameter (default 60).
            weights: Optional weights for each result set (default equal weights).
            dedup_key: Optional metadata key for deduplication.
        
        Returns:
            Merged list of documents sorted by RRF score.
        """
        if not results_list:
            return []

        if weights is None:
            weights = [1.0 / len(results_list)] * len(results_list)

        # Normalize weights
        total_weight = sum(weights)
        weights = [w / total_weight for w in weights]

        # Calculate RRF scores
        rrf_scores = {}
        doc_map = {}

        for result_set, weight in zip(results_list, weights):
            for rank, doc in enumerate(result_set, 1):
                # Use metadata key for uniqueness if provided
                if dedup_key:
                    key = doc.metadata.get(dedup_key)
                    if key is None:
                        key = doc.page_content
                else:
                    key = doc.page_content

                doc_map[key] = doc

                rrf_score = (weight * 1.0) / (k + rank)
                rrf_scores[key] = rrf_scores.get(key, 0) + rrf_score

        sorted_keys = sorted(
            rrf_scores.keys(), key=lambda x: rrf_scores[x], reverse=True
        )

        return [doc_map[key] for key in sorted_keys]

    @staticmethod
    def weighted_merge(
        results_list: list[list[Document]],
        weights: list[float] | None = None,
        dedup_key: str | None = None,
    ) -> list[Document]:
        """Merge results with weighted scoring.
        
        Args:
            results_list: List of result sets from multiple searches.
            weights: Weights for each result set (default equal weights).
            dedup_key: Optional metadata key for deduplication.
        
        Returns:
            Merged list of documents sorted by weighted score.
        """
        if not results_list:
            return []

        if weights is None:
            weights = [1.0 / len(results_list)] * len(results_list)

        # Normalize weights
        total_weight = sum(weights)
        weights = [w / total_weight for w in weights]

        # Calculate weighted scores
        weighted_scores = {}
        doc_map = {}

        for result_set, weight in zip(results_list, weights):
            for rank, doc in enumerate(result_set):
                if dedup_key:
                    key = doc.metadata.get(dedup_key)
                    if key is None:
                        key = doc.page_content
                else:
                    key = doc.page_content

                doc_map[key] = doc

                # Score decreases with rank
                score = weight * max(0, 1.0 - (rank / max(len(result_set), 1)))
                weighted_scores[key] = weighted_scores.get(key, 0) + score

        sorted_keys = sorted(
            weighted_scores.keys(),
            key=lambda x: weighted_scores[x],
            reverse=True,
        )

        return [doc_map[key] for key in sorted_keys]

Configuration

pinecone:
  api_key: "${PINECONE_API_KEY}"
  index_name: "hybrid-search"
  namespace: "default"
  dimension: 384
  metric: "cosine"
  recreate: false

embeddings:
  model: "sentence-transformers/all-MiniLM-L6-v2"
  device: "cpu"

sparse:
  model: "naver/splade-cocondenser-ensembledistil"

fusion:
  strategy: "rrf"    # "rrf" or "weighted"
  dense_weight: 0.7  # Used only when strategy is "weighted"
  sparse_weight: 0.3

search:
  top_k: 10
  fetch_k: 30        # Candidate pool per retriever before fusion

When to use it

Mixed query styles where some users phrase naturally and others search with domain terms
Enterprise knowledge bases with exact product names, codes, or identifiers alongside conceptual questions
Any workload where pure semantic search misses documents containing exact query terms

When not to use it

Small datasets where dual indexing complexity has negligible quality impact
Prototypes where the semantic baseline has not yet been validated
Backends that do not natively support sparse vectors

Tradeoffs

Dimension	What to expect
Quality	Usually improves recall robustness by covering both semantic and lexical intent
Latency	Moderate increase from two embedding models and two retrieval paths
Cost	Higher indexing and query cost from dual embeddings and more complex search

Settings to tune first

fusion.strategy

string

default:"rrf"

"rrf" requires no tuning; "weighted" gives explicit control over dense vs sparse contribution.

sparse.model

string

SPLADE model quality directly affects lexical matching coverage.Recommended: naver/splade-cocondenser-ensembledistil

search.fetch_k

integer

default:"30"

Each retriever fetches this many candidates before fusion; larger pools improve fusion quality.

Common pitfalls

Unbalanced fusion: Weight near-zero on either side effectively reverts to single-signal retrieval. Measure both retrieval paths independently first.

Missing sparse model at query time: Ensure both dense and sparse embedding configs are consistent between indexing and search scripts.

Not validating per-query-class behavior: Hybrid helps keyword-heavy queries most. If your evaluation set is all natural-language questions, the improvement over semantic search may be modest.

Backends supported

Chroma, Milvus, Pinecone, Qdrant, Weaviate.

Next steps

Add reranking

Add reranking after fusion for further precision improvement

Sparse-only indexing

Use sparse indexing alone if keyword precision is the dominant need

Measure improvement

Measure against semantic search to quantify the hybrid improvement

Components

Explore other reusable LangChain components

Haystack

LangChain

Hybrid search implementation

How it works

Reciprocal Rank Fusion

Pinecone hybrid indexing

Result merger for fusion

Configuration

When to use it

When not to use it

Tradeoffs

Settings to tune first

Common pitfalls

Backends supported

Next steps

Add reranking

Sparse-only indexing

Measure improvement

Components

Build docs developers (and LLMs) love

Haystack

LangChain

​How it works

​Reciprocal Rank Fusion

​Pinecone hybrid indexing

​Result merger for fusion

​Configuration

​When to use it

​When not to use it

​Tradeoffs

​Settings to tune first

​Common pitfalls

​Backends supported

​Next steps

Add reranking

Sparse-only indexing

Measure improvement

Components

Build docs developers (and LLMs) love

How it works

Reciprocal Rank Fusion

Pinecone hybrid indexing

Result merger for fusion

Configuration

When to use it

When not to use it

Tradeoffs

Settings to tune first

Common pitfalls

Backends supported

Next steps