Haystack semantic search

Semantic search retrieves documents by meaning rather than exact keyword overlap. Documents and queries are converted into dense vector embeddings by the same model, and similarity is measured by cosine distance in the embedding space.

How it works

Indexing

Each document’s text is passed through a SentenceTransformers model (SentenceTransformersDocumentEmbedder) to produce a dense float vector. The vector and document metadata are stored in the target vector database.

Query embedding

At search time, the query string is embedded with the same model (SentenceTransformersTextEmbedder) to produce a query vector.

Nearest-neighbor retrieval

The database performs approximate nearest-neighbor (ANN) search over indexed embeddings and returns the top-k most similar documents ranked by cosine similarity score.

Optional filtering

Metadata filters can be applied to restrict the candidate set before similarity scoring, using the backend’s native filter syntax.

The EmbedderFactory (from utils/embeddings.py) creates and warms up both the document and text embedders from the config file. Warm-up pre-loads the model weights so the first real call does not incur cold-start latency.

Indexing pipeline example

src/vectordb/haystack/semantic_search/indexing/chroma.py

from vectordb.databases.chroma import ChromaVectorDB
from vectordb.dataloaders import DataloaderCatalog
from vectordb.haystack.utils import ConfigLoader, EmbedderFactory

class ChromaSemanticIndexingPipeline:
    """Chroma indexing pipeline for semantic search."""

    def __init__(self, config_or_path: dict[str, Any] | str) -> None:
        """Initialize indexing pipeline from configuration."""
        self.config = ConfigLoader.load(config_or_path)
        ConfigLoader.validate(self.config, "chroma")

        self.embedder = EmbedderFactory.create_document_embedder(self.config)

        chroma_config = self.config["chroma"]
        self.db = ChromaVectorDB(
            host=chroma_config.get("host", "localhost"),
            port=chroma_config.get("port", 8000),
        )

        self.collection_name = chroma_config["collection_name"]
        logger.info("Initialized Chroma indexing pipeline")

    def run(self) -> dict[str, Any]:
        """Execute the full indexing pipeline."""
        # Load documents
        dl_config = self.config.get("dataloader", {})
        loader = DataloaderCatalog.create(
            dl_config.get("type", "triviaqa"),
            split=dl_config.get("split", "test"),
            limit=dl_config.get("limit"),
        )
        dataset = loader.load()
        documents = dataset.to_haystack()
        logger.info("Loaded %d documents", len(documents))

        # Generate embeddings
        embedded_docs = self.embedder.run(documents=documents)["documents"]
        logger.info("Generated embeddings for %d documents", len(embedded_docs))

        # Create collection and insert
        self.db.create_collection(
            collection_name=self.collection_name,
            recreate=self.config.get("chroma", {}).get("recreate", False),
        )

        self.db.insert_documents(
            documents=embedded_docs,
            collection_name=self.collection_name,
        )
        logger.info("Indexed %d documents to Chroma", len(embedded_docs))

        return {"documents_indexed": len(embedded_docs)}

Search pipeline example

src/vectordb/haystack/semantic_search/search/chroma.py

from vectordb.databases.chroma import ChromaVectorDB
from vectordb.haystack.utils import (
    ConfigLoader,
    DiversificationHelper,
    DocumentFilter,
    EmbedderFactory,
    RAGHelper,
)

class ChromaSemanticSearchPipeline:
    """Chroma semantic search pipeline with RAG support."""

    def __init__(self, config_or_path: dict[str, Any] | str) -> None:
        """Initialize search pipeline from configuration."""
        self.config = ConfigLoader.load(config_or_path)
        ConfigLoader.validate(self.config, "chroma")

        self.embedder = EmbedderFactory.create_text_embedder(self.config)

        chroma_config = self.config["chroma"]
        self.db = ChromaVectorDB(
            host=chroma_config.get("host", "localhost"),
            port=chroma_config.get("port", 8000),
        )
        self.collection_name = chroma_config["collection_name"]

        # Optional RAG generator
        self.rag_enabled = self.config.get("rag", {}).get("enabled", False)
        self.generator = (
            RAGHelper.create_generator(self.config) if self.rag_enabled else None
        )

        logger.info("Initialized Chroma search pipeline")

    def search(
        self,
        query: str,
        top_k: int = 10,
        filters: dict[str, Any] | None = None,
    ) -> dict[str, Any]:
        """Execute semantic search."""
        # Embed query
        query_result = self.embedder.run(text=query)
        query_embedding = query_result["embedding"]

        # Search Chroma
        filters = DocumentFilter.normalize(filters)
        documents = self.db.search(
            query_embedding=query_embedding,
            top_k=top_k * 2,
            collection_name=self.collection_name,
            where=filters if filters else None,
        )
        logger.info("Retrieved %d documents", len(documents))

        # Apply filters and diversification
        documents = DocumentFilter.apply(documents, filters)
        documents = DiversificationHelper.apply(documents, self.config)
        documents = documents[:top_k]

        result: dict[str, Any] = {
            "documents": documents,
            "query": query,
        }

        # Optional RAG
        if self.rag_enabled and self.generator and documents:
            prompt = RAGHelper.format_prompt(query, documents)
            gen_result = self.generator.run(prompt=prompt)
            result["answer"] = gen_result.get("replies", [""])[0]

        return result

Configuration

Each backend has config files under configs/. A typical config:

pinecone:
  api_key: "${PINECONE_API_KEY}"
  index_name: "semantic-search"

embeddings:
  model: "sentence-transformers/all-MiniLM-L6-v2"
  device: "cpu"
  batch_size: 32

dataloader:
  dataset: "triviaqa"
  split: "test"
  limit: 500

search:
  top_k: 10

rag:
  enabled: false

When to use it

Natural-language questions where phrasing in the question may differ from phrasing in documents

General-purpose RAG starting points before specializing with advanced features

Any corpus where exact keyword overlap between query and documents is unreliable

When not to use it

Strict compliance or legal workflows where specific terms must appear verbatim

Very small corpora (fewer than a few hundred documents) where BM25 already saturates quality

Keyword-heavy technical workloads with domain acronyms and jargon where semantic generalization is unhelpful

Settings to tune first

Setting	Why it matters
`embeddings.model`	The single largest quality lever. Better models produce more meaningful similarity scores.
`search.top_k`	Controls how many candidates are returned. Too small misses evidence; too large increases downstream cost.
`dataloader.limit`	Controls corpus size for experiments. Start small to validate the pipeline, then scale up.

Common pitfalls

Mismatched embedding models: Using a different model for indexing and querying produces meaningless similarity scores. Always use the same model value in both indexing and search configs.

Oversized chunks: Large text chunks blur the embedding signal, making the vector represent too many topics at once. Shorter, focused chunks usually produce better retrieval.

Too small top_k: If relevant evidence is rarely in the top 3 results, increasing top_k to 10 or 20 and then applying reranking usually helps more than tuning the embedding model.

Supported backends

Chroma, Milvus, Pinecone, Qdrant, Weaviate. Each backend has an indexing script in indexing/ and a search script in search/.

Dataset configs provided

ARC, Earnings Calls, FActScore, PopQA, TriviaQA. Config files are named {backend}_{dataset}.yaml inside configs/.

Next steps

Hybrid search

Add sparse embeddings for keyword precision

Components

Add reranking for better final-result precision

Pipelines

Learn about advanced pipeline patterns

Haystack

LangChain

Haystack semantic search

How it works

Indexing pipeline example

Search pipeline example

Configuration

When to use it

When not to use it

Settings to tune first

Common pitfalls

Supported backends

Dataset configs provided

Next steps

Hybrid search

Components

Pipelines

Build docs developers (and LLMs) love

Haystack

LangChain

​How it works

​Indexing pipeline example

​Search pipeline example

​Configuration

​When to use it

​When not to use it

​Settings to tune first

​Common pitfalls

​Supported backends

​Dataset configs provided

​Next steps

Hybrid search

Components

Pipelines

Build docs developers (and LLMs) love

How it works

Indexing pipeline example

Search pipeline example

Configuration

When to use it

When not to use it

Settings to tune first

Common pitfalls

Supported backends

Dataset configs provided

Next steps