Haystack hybrid search

Hybrid retrieval combines dense semantic embeddings with sparse lexical embeddings to improve robustness across both natural-language queries and keyword-precise queries. When one signal is weak, the other compensates.

How it works

Dual indexing

Each document is embedded twice — once with a dense SentenceTransformers model (SentenceTransformersDocumentEmbedder) to produce a float vector capturing semantic meaning, and once with a sparse SentenceTransformers model (SentenceTransformersSparseDocumentEmbedder, typically a SPLADE model) to produce a token-weight sparse vector capturing lexical features.

Dual retrieval

At query time, the query is embedded with both the dense text embedder and the sparse text embedder to produce two query representations.

Score fusion

Results from the dense retriever and the sparse retriever are merged using ResultMerger (from utils/fusion.py). The default strategy is Reciprocal Rank Fusion (RRF), which combines rankings without requiring score normalization.

Final ranking

The fused, deduplicated result list is returned as the top-k documents.

RRF formula

score(d) = Σ 1 / (k + rank_i)

Where the sum is over all retrieval sources and k (default 60) smooths rank differences.

Pinecone hybrid indexing example

src/vectordb/haystack/hybrid_indexing/indexing/pinecone.py

from vectordb.databases.pinecone import PineconeVectorDB
from vectordb.dataloaders import DataloaderCatalog
from vectordb.haystack.utils import ConfigLoader, EmbedderFactory

class PineconeHybridIndexingPipeline:
    """Pinecone hybrid (dense + sparse) indexing pipeline."""

    def __init__(self, config_or_path: dict[str, Any] | str) -> None:
        """Initialize indexing pipeline from configuration."""
        self.config = ConfigLoader.load(config_or_path)
        ConfigLoader.validate(self.config, "pinecone")

        # Create both dense and sparse embedders
        self.dense_embedder = EmbedderFactory.create_document_embedder(self.config)
        self.sparse_embedder = None
        if "sparse" in self.config:
            self.sparse_embedder = EmbedderFactory.create_sparse_document_embedder(
                self.config
            )

        pinecone_config = self.config["pinecone"]
        self.db = PineconeVectorDB(
            api_key=pinecone_config.get("api_key"),
            index_name=pinecone_config.get("index_name"),
            host=pinecone_config.get("host"),
        )

        self.index_name = pinecone_config.get("index_name")
        self.namespace = pinecone_config.get("namespace", "default")

    def _embed_documents(self, documents: list[Document]) -> list[Document]:
        """Generate dense and sparse embeddings for documents."""
        # Dense embeddings
        dense_result = self.dense_embedder.run(documents=documents)
        embedded_docs = dense_result["documents"]

        # Sparse embeddings (if configured)
        if self.sparse_embedder:
            sparse_result = self.sparse_embedder.run(documents=embedded_docs)
            embedded_docs = sparse_result["documents"]

        return embedded_docs

    def run(self) -> dict[str, Any]:
        """Execute the complete indexing pipeline."""
        # Load documents
        dl_config = self.config.get("dataloader", {})
        loader = DataloaderCatalog.create(
            dl_config.get("type", "triviaqa"),
            split=dl_config.get("split", "test"),
            limit=dl_config.get("limit"),
        )
        dataset = loader.load()
        documents = dataset.to_haystack()

        # Embed with both dense and sparse models
        embedded_docs = self._embed_documents(documents)

        # Upsert to Pinecone
        self.db.upsert(
            documents=embedded_docs,
            index_name=self.index_name,
            namespace=self.namespace,
        )

        return {
            "documents_indexed": len(embedded_docs),
            "db": "pinecone",
            "index_name": self.index_name,
        }

Pinecone hybrid search example

src/vectordb/haystack/hybrid_indexing/search/pinecone.py

from vectordb.databases.pinecone import PineconeVectorDB
from vectordb.haystack.utils import ConfigLoader, EmbedderFactory

class PineconeHybridSearchPipeline:
    """Pinecone hybrid (dense + sparse) search pipeline.
    
    Uses Pinecone's native sparse_vector support with alpha weighting:
    final_score = alpha * dense_score + (1 - alpha) * sparse_score
    """

    def __init__(self, config_or_path: dict[str, Any] | str) -> None:
        """Initialize search pipeline from configuration."""
        self.config = ConfigLoader.load(config_or_path)
        ConfigLoader.validate(self.config, "pinecone")

        # Initialize query embedders
        self.dense_embedder = EmbedderFactory.create_text_embedder(self.config)
        self.sparse_embedder = None
        if "sparse" in self.config:
            self.sparse_embedder = EmbedderFactory.create_sparse_text_embedder(
                self.config
            )

        pinecone_config = self.config["pinecone"]
        self.db = PineconeVectorDB(
            api_key=pinecone_config.get("api_key"),
            index_name=pinecone_config.get("index_name"),
            host=pinecone_config.get("host"),
        )

        self.index_name = pinecone_config.get("index_name")
        self.namespace = pinecone_config.get("namespace", "default")
        self.alpha = pinecone_config.get("alpha", 0.5)  # 0.5 = equal weighting

    def _embed_query(self, query: str) -> tuple[list[float], Any | None]:
        """Embed query with dense and sparse embedders."""
        # Dense embedding
        dense_result = self.dense_embedder.run(text=query)
        dense_embedding = dense_result.get("embedding")

        # Sparse embedding (if configured)
        sparse_embedding = None
        if self.sparse_embedder:
            sparse_result = self.sparse_embedder.run(text=query)
            sparse_embedding = sparse_result.get("sparse_embedding")

        return dense_embedding, sparse_embedding

    def run(
        self,
        query: str,
        top_k: int = 10,
        filters: dict[str, Any] | None = None,
    ) -> dict[str, Any]:
        """Execute hybrid search query."""
        # Generate both embeddings
        dense_embedding, sparse_embedding = self._embed_query(query)

        # Execute Pinecone's native hybrid search
        documents = self.db.hybrid_search(
            query_embedding=dense_embedding,
            query_sparse_embedding=sparse_embedding,
            index_name=self.index_name,
            namespace=self.namespace,
            top_k=top_k,
            filter=filters,
            alpha=self.alpha,
        )

        return {
            "documents": documents,
            "query": query,
            "db": "pinecone",
        }

Result fusion with RRF

For backends without native hybrid support, use ResultMerger:

src/vectordb/haystack/components/result_merger.py

from haystack import Document

class ResultMerger:
    """Merge results from multiple retrieval sources."""

    @staticmethod
    def rrf_fusion(
        dense_docs: list[Document],
        sparse_docs: list[Document],
        k: int = 60,
        top_k: int | None = None,
    ) -> list[Document]:
        """Reciprocal Rank Fusion.
        
        Args:
            dense_docs: Documents from dense retriever (ordered by relevance)
            sparse_docs: Documents from sparse retriever (ordered by relevance)
            k: RRF parameter (constant added to rank, default 60)
            top_k: Return top K documents
        
        Returns:
            Fused and reranked documents
        """
        rrf_scores: dict[str, float] = {}

        # Score dense results
        for rank, doc in enumerate(dense_docs, 1):
            doc_id = ResultMerger.stable_doc_id(doc)
            rrf_scores[doc_id] = rrf_scores.get(doc_id, 0) + 1 / (k + rank)

        # Score sparse results
        for rank, doc in enumerate(sparse_docs, 1):
            doc_id = ResultMerger.stable_doc_id(doc)
            rrf_scores[doc_id] = rrf_scores.get(doc_id, 0) + 1 / (k + rank)

        # Build document map for deduplication
        doc_map = {}
        for doc in dense_docs + sparse_docs:
            doc_id = ResultMerger.stable_doc_id(doc)
            if doc_id not in doc_map:
                doc_map[doc_id] = doc

        # Sort by RRF score
        sorted_docs = [
            doc_map[doc_id]
            for doc_id in sorted(
                rrf_scores.keys(), key=lambda x: rrf_scores[x], reverse=True
            )
            if doc_id in doc_map
        ]

        if top_k is None:
            top_k = max(len(dense_docs), len(sparse_docs))

        return sorted_docs[:top_k]

    @staticmethod
    def weighted_fusion(
        dense_docs: list[Document],
        sparse_docs: list[Document],
        dense_weight: float = 0.7,
        sparse_weight: float = 0.3,
        top_k: int | None = None,
    ) -> list[Document]:
        """Weighted sum fusion with score normalization."""
        # Implementation handles score normalization and weighted combination
        # See full implementation in result_merger.py
        pass

Configuration

pinecone:
  api_key: "${PINECONE_API_KEY}"
  index_name: "hybrid-search"
  alpha: 0.5  # 0.5 = equal dense/sparse weighting

embeddings:
  model: "sentence-transformers/all-MiniLM-L6-v2"
  device: "cpu"
  batch_size: 32

sparse:
  model: "naver/splade-cocondenser-ensembledistil"

fusion:
  strategy: "rrf"  # or "weighted"
  dense_weight: 0.7  # Only for weighted fusion
  sparse_weight: 0.3

dataloader:
  dataset: "triviaqa"
  limit: 500

search:
  top_k: 10

When to use it

Corpora with mixed query styles: some users ask in natural language, others search with domain keywords or acronyms

Enterprise knowledge bases where exact product names, codes, or identifiers matter alongside conceptual questions

Any workload where pure semantic search misses highly relevant documents that contain exact query terms

When not to use it

Small datasets where the added complexity of dual indexing and fusion has negligible quality impact

Prototypes or early experiments where you have not yet validated whether the semantic baseline falls short

Settings to tune first

Setting	Why it matters
`fusion.strategy`	`"rrf"` requires no tuning; `"weighted"` lets you favor dense or sparse signal
`fusion.dense_weight` / `fusion.sparse_weight`	Only for weighted fusion; start at 0.7/0.3 and adjust based on query type distribution
`sparse.model`	SPLADE model quality directly affects lexical matching behavior
`search.top_k`	Final merged result count; set larger than semantic-only top_k to preserve fusion coverage

Common pitfalls

Unbalanced fusion: Setting one weight to near-zero effectively reverts to single-signal retrieval. Measure both retrieval paths independently before fusing.

Missing sparse vectors at query time: If the indexing config uses sparse embeddings but the search config does not, the sparse retrieval path returns nothing. Keep configs consistent.

Not validating per-query-class behavior: Hybrid usually helps keyword queries most and natural-language queries least. If your evaluation set is exclusively natural-language questions, the improvement over semantic search may be small.

Supported backends

Chroma, Milvus, Pinecone, Qdrant, Weaviate.

Dataset configs provided

ARC, Earnings Calls, FActScore, PopQA, TriviaQA.

Next steps

Components

Add reranking after fusion for further precision improvement

Pipelines

Learn about advanced pipeline composition patterns

Haystack

LangChain

Haystack hybrid search

How it works

RRF formula

Pinecone hybrid indexing example

Pinecone hybrid search example

Result fusion with RRF

Configuration

When to use it

When not to use it

Settings to tune first

Common pitfalls

Supported backends

Dataset configs provided

Next steps

Components

Pipelines

Build docs developers (and LLMs) love

Haystack

LangChain

​How it works

​RRF formula

​Pinecone hybrid indexing example

​Pinecone hybrid search example

​Result fusion with RRF

​Configuration

​When to use it

​When not to use it

​Settings to tune first

​Common pitfalls

​Supported backends

​Dataset configs provided

​Next steps

Components

Pipelines

Build docs developers (and LLMs) love

How it works

RRF formula

Pinecone hybrid indexing example

Pinecone hybrid search example

Result fusion with RRF

Configuration

When to use it

When not to use it

Settings to tune first

Common pitfalls

Supported backends

Dataset configs provided

Next steps