Semantic search pipelines

Semantic search retrieves documents by meaning rather than exact keyword overlap. Documents and queries are converted into dense vector embeddings by the same model, and similarity is measured by cosine distance in the embedding space.

How it works

Indexing

Each document’s text is embedded using HuggingFaceEmbeddings (created via EmbedderHelper.create_embedder(config)). The resulting float vector and document metadata are stored in the target backend through the backend’s LangChain integration.

Query embedding

At search time, the same embedder model embeds the query string via EmbedderHelper.embed_query(embedder, query).

Nearest-neighbor retrieval

The LangChain retriever performs approximate nearest-neighbor search and returns the top-k most similar documents.

Optional generation

If rag.enabled: true, retrieved documents are formatted into a prompt using RAGHelper.format_prompt() and passed to a ChatGroq LLM for answer generation.

Pipeline implementation

The semantic search pipeline is implemented as two classes per backend: one for indexing and one for search.

Indexing pipeline

src/vectordb/langchain/semantic_search/indexing/chroma.py

from vectordb.databases.chroma import ChromaVectorDB
from vectordb.dataloaders import DataloaderCatalog
from vectordb.langchain.utils import ConfigLoader, EmbedderHelper

class ChromaSemanticIndexingPipeline:
    """Chroma indexing pipeline for semantic search (LangChain).
    
    Loads documents from configured data source, generates dense embeddings,
    and indexes them in a local Chroma collection for similarity retrieval.
    """

    def __init__(self, config_or_path: dict[str, Any] | str) -> None:
        self.config = ConfigLoader.load(config_or_path)
        ConfigLoader.validate(self.config, "chroma")

        # Create embedder from config
        self.embedder = EmbedderHelper.create_embedder(self.config)

        # Initialize Chroma database
        chroma_config = self.config["chroma"]
        self.db = ChromaVectorDB(
            path=chroma_config.get("path", "./chroma_data"),
        )

        self.collection_name = chroma_config.get("collection_name", "semantic_search")

    def run(self) -> dict[str, Any]:
        """Execute indexing pipeline."""
        # Load documents with optional limit
        limit = self.config.get("dataloader", {}).get("limit")
        dl_config = self.config.get("dataloader", {})
        loader = DataloaderCatalog.create(
            dl_config.get("type", "triviaqa"),
            split=dl_config.get("split", "test"),
            limit=limit,
        )
        dataset = loader.load()
        documents = dataset.to_langchain()

        # Generate embeddings for all documents
        docs, embeddings = EmbedderHelper.embed_documents(self.embedder, documents)

        # Create or recreate collection
        recreate = self.config.get("chroma", {}).get("recreate", False)
        self.db.create_collection(
            name=self.collection_name,
            recreate=recreate,
        )

        # Upsert documents with embeddings to Chroma
        num_indexed = self.db.upsert(
            documents=docs,
            embeddings=embeddings,
            collection_name=self.collection_name,
        )

        return {"documents_indexed": num_indexed}

Search pipeline

src/vectordb/langchain/semantic_search/search/chroma.py

from vectordb.databases.chroma import ChromaVectorDB
from vectordb.langchain.utils import ConfigLoader, EmbedderHelper, RAGHelper
from vectordb.utils.chroma_document_converter import ChromaDocumentConverter

class ChromaSemanticSearchPipeline:
    """Chroma semantic search pipeline (LangChain).
    
    Implements dense vector similarity search on Chroma collections.
    Queries are embedded and matched against stored document embeddings
    to find semantically similar documents.
    """

    def __init__(self, config_or_path: dict[str, Any] | str) -> None:
        self.config = ConfigLoader.load(config_or_path)
        ConfigLoader.validate(self.config, "chroma")

        self.embedder = EmbedderHelper.create_embedder(self.config)

        chroma_config = self.config["chroma"]
        self.db = ChromaVectorDB(
            path=chroma_config.get("path", "./chroma_data"),
        )

        self.collection_name = chroma_config.get("collection_name", "semantic_search")
        self.llm = RAGHelper.create_llm(self.config)

    def search(
        self,
        query: str,
        top_k: int = 10,
        filters: dict[str, Any] | None = None,
    ) -> dict[str, Any]:
        """Execute semantic search against Chroma collection."""
        # Embed query for similarity search
        query_embedding = EmbedderHelper.embed_query(self.embedder, query)

        self.db._get_collection(self.collection_name)
        results_dict = self.db.query(
            query_embedding=query_embedding,
            n_results=top_k,
            where=filters,
        )
        documents = (
            ChromaDocumentConverter.convert_query_results_to_langchain_documents(
                results_dict
            )
        )

        result = {
            "documents": documents,
            "query": query,
        }

        # Generate RAG answer if LLM is configured
        if self.llm is not None:
            answer = RAGHelper.generate(self.llm, query, documents)
            result["answer"] = answer

        return result

Configuration

chroma:
  path: "./chroma_data"  # Directory for local storage
  collection_name: "documents"  # Collection name
  recreate: false  # Whether to recreate collection

embeddings:
  model: "sentence-transformers/all-MiniLM-L6-v2"
  device: "cpu"  # or "cuda" for GPU
  batch_size: 32

dataloader:
  dataset: "triviaqa"
  split: "test"
  limit: 500

search:
  top_k: 10

rag:
  enabled: false

Embedding helper

The EmbedderHelper provides static methods for creating and using HuggingFace embedding models:

src/vectordb/langchain/utils/embeddings.py

from langchain_huggingface import HuggingFaceEmbeddings

class EmbedderHelper:
    """Helper class for HuggingFace embedding model operations."""

    @classmethod
    def create_embedder(cls, config: dict[str, Any]) -> HuggingFaceEmbeddings:
        """Create HuggingFaceEmbeddings from config."""
        embeddings_config = config.get("embeddings", {})
        model = embeddings_config.get("model", "sentence-transformers/all-MiniLM-L6-v2")
        device = embeddings_config.get("device", "cpu")
        batch_size = embeddings_config.get("batch_size", 32)

        return HuggingFaceEmbeddings(
            model_name=model,
            model_kwargs={"device": device},
            encode_kwargs={"batch_size": batch_size},
        )

    @classmethod
    def embed_documents(
        cls, embedder: HuggingFaceEmbeddings, documents: list[Document]
    ) -> tuple[list[Document], list[list[float]]]:
        """Embed documents and return with embeddings."""
        texts = [doc.page_content for doc in documents]
        embeddings = embedder.embed_documents(texts)
        return documents, embeddings

    @classmethod
    def embed_query(cls, embedder: HuggingFaceEmbeddings, query: str) -> list[float]:
        """Embed a single query."""
        return embedder.embed_query(query)

RAG helper

The RAGHelper creates LLMs and formats prompts for answer generation:

src/vectordb/langchain/utils/rag.py

from langchain_groq import ChatGroq

class RAGHelper:
    """Helper for RAG-related operations."""

    DEFAULT_PROMPT_TEMPLATE = """{context}

Question: {query}

Answer:"""

    @classmethod
    def create_llm(cls, config: dict[str, Any]) -> ChatGroq | None:
        """Create ChatGroq LLM from config."""
        rag_config = config.get("rag", {})
        if not rag_config.get("enabled", False):
            return None

        model = rag_config.get("model", "llama-3.3-70b-versatile")
        api_key = rag_config.get("api_key") or os.environ.get("GROQ_API_KEY")
        temperature = rag_config.get("temperature", 0.7)
        max_tokens = rag_config.get("max_tokens", 2048)

        return ChatGroq(
            model=model,
            api_key=api_key,
            temperature=temperature,
            max_tokens=max_tokens,
        )

    @classmethod
    def generate(
        cls,
        llm: ChatGroq,
        query: str,
        documents: list[Document],
        template: str | None = None,
    ) -> str:
        """Generate RAG answer using LLM."""
        prompt = cls.format_prompt(query, documents, template)
        response = llm.invoke(prompt)
        return response.content

When to use it

Natural-language questions where query phrasing differs from document vocabulary
General-purpose RAG baseline before specializing with advanced features
Any corpus where exact keyword overlap between query and documents is unreliable

When not to use it

Strict compliance or legal workflows where exact terms must appear verbatim
Very small corpora where BM25 already saturates quality
Keyword-heavy technical workloads where semantic generalization is unhelpful

Tradeoffs

Dimension	What to expect
Quality	Strong semantic recall; may miss exact terminology
Latency	Low to moderate; dominated by embedding inference
Cost	Embedding compute + vector search cost per query

Settings to tune first

embeddings.model

string

The primary quality lever; the model determines how semantically meaningful similarity scores are.Common choices:

sentence-transformers/all-MiniLM-L6-v2: Fast, 384-dimensional
sentence-transformers/all-mpnet-base-v2: Higher quality, 768-dimensional
BAAI/bge-small-en-v1.5: Strong retrieval performance

search.top_k

integer

default:"10"

Controls the number of returned candidates; too small misses evidence, too large increases downstream cost.

dataloader.limit

integer

Corpus size for experiments; start small to validate pipeline, then scale up.

Common pitfalls

Mismatched embedding models: Using a different model for indexing and querying produces meaningless similarity scores.

Oversized chunks: Large text chunks blur the embedding signal. Shorter, focused chunks typically produce better retrieval.

Too small top_k: If relevant evidence is rarely in the top 3 results, increase top_k and apply reranking rather than only tuning the embedding model.

Backends supported

Chroma, Milvus, Pinecone, Qdrant, Weaviate.

Next steps

Add reranking

Add two-stage retrieval for better final-result precision

Hybrid search

Switch to hybrid indexing if queries mix natural language with domain keywords

Metadata filtering

Add metadata filtering if the corpus has reliable structured attributes

Components

Explore reusable components for query enhancement and compression

Haystack

LangChain

Semantic search pipelines

How it works

Pipeline implementation

Indexing pipeline

Search pipeline

Configuration

Embedding helper

RAG helper

When to use it

When not to use it

Tradeoffs

Settings to tune first

Common pitfalls

Backends supported

Next steps

Add reranking

Hybrid search

Metadata filtering

Components

Build docs developers (and LLMs) love

Haystack

LangChain

​How it works

​Pipeline implementation

​Indexing pipeline

​Search pipeline

​Configuration

​Embedding helper

​RAG helper

​When to use it

​When not to use it

​Tradeoffs

​Settings to tune first

​Common pitfalls

​Backends supported

​Next steps

Add reranking

Hybrid search

Metadata filtering

Components

Build docs developers (and LLMs) love

How it works

Pipeline implementation

Indexing pipeline

Search pipeline

Configuration

Embedding helper

RAG helper

When to use it

When not to use it

Tradeoffs

Settings to tune first

Common pitfalls

Backends supported

Next steps