Skip to main content
Semantic search retrieves documents by meaning rather than exact keyword overlap. Documents and queries are converted into dense vector embeddings by the same model, and similarity is measured by cosine distance in the embedding space.

How it works

1

Indexing

Each document’s text is embedded using HuggingFaceEmbeddings (created via EmbedderHelper.create_embedder(config)). The resulting float vector and document metadata are stored in the target backend through the backend’s LangChain integration.
2

Query embedding

At search time, the same embedder model embeds the query string via EmbedderHelper.embed_query(embedder, query).
3

Nearest-neighbor retrieval

The LangChain retriever performs approximate nearest-neighbor search and returns the top-k most similar documents.
4

Optional generation

If rag.enabled: true, retrieved documents are formatted into a prompt using RAGHelper.format_prompt() and passed to a ChatGroq LLM for answer generation.

Pipeline implementation

The semantic search pipeline is implemented as two classes per backend: one for indexing and one for search.

Indexing pipeline

src/vectordb/langchain/semantic_search/indexing/chroma.py
from vectordb.databases.chroma import ChromaVectorDB
from vectordb.dataloaders import DataloaderCatalog
from vectordb.langchain.utils import ConfigLoader, EmbedderHelper

class ChromaSemanticIndexingPipeline:
    """Chroma indexing pipeline for semantic search (LangChain).
    
    Loads documents from configured data source, generates dense embeddings,
    and indexes them in a local Chroma collection for similarity retrieval.
    """

    def __init__(self, config_or_path: dict[str, Any] | str) -> None:
        self.config = ConfigLoader.load(config_or_path)
        ConfigLoader.validate(self.config, "chroma")

        # Create embedder from config
        self.embedder = EmbedderHelper.create_embedder(self.config)

        # Initialize Chroma database
        chroma_config = self.config["chroma"]
        self.db = ChromaVectorDB(
            path=chroma_config.get("path", "./chroma_data"),
        )

        self.collection_name = chroma_config.get("collection_name", "semantic_search")

    def run(self) -> dict[str, Any]:
        """Execute indexing pipeline."""
        # Load documents with optional limit
        limit = self.config.get("dataloader", {}).get("limit")
        dl_config = self.config.get("dataloader", {})
        loader = DataloaderCatalog.create(
            dl_config.get("type", "triviaqa"),
            split=dl_config.get("split", "test"),
            limit=limit,
        )
        dataset = loader.load()
        documents = dataset.to_langchain()

        # Generate embeddings for all documents
        docs, embeddings = EmbedderHelper.embed_documents(self.embedder, documents)

        # Create or recreate collection
        recreate = self.config.get("chroma", {}).get("recreate", False)
        self.db.create_collection(
            name=self.collection_name,
            recreate=recreate,
        )

        # Upsert documents with embeddings to Chroma
        num_indexed = self.db.upsert(
            documents=docs,
            embeddings=embeddings,
            collection_name=self.collection_name,
        )

        return {"documents_indexed": num_indexed}

Search pipeline

src/vectordb/langchain/semantic_search/search/chroma.py
from vectordb.databases.chroma import ChromaVectorDB
from vectordb.langchain.utils import ConfigLoader, EmbedderHelper, RAGHelper
from vectordb.utils.chroma_document_converter import ChromaDocumentConverter

class ChromaSemanticSearchPipeline:
    """Chroma semantic search pipeline (LangChain).
    
    Implements dense vector similarity search on Chroma collections.
    Queries are embedded and matched against stored document embeddings
    to find semantically similar documents.
    """

    def __init__(self, config_or_path: dict[str, Any] | str) -> None:
        self.config = ConfigLoader.load(config_or_path)
        ConfigLoader.validate(self.config, "chroma")

        self.embedder = EmbedderHelper.create_embedder(self.config)

        chroma_config = self.config["chroma"]
        self.db = ChromaVectorDB(
            path=chroma_config.get("path", "./chroma_data"),
        )

        self.collection_name = chroma_config.get("collection_name", "semantic_search")
        self.llm = RAGHelper.create_llm(self.config)

    def search(
        self,
        query: str,
        top_k: int = 10,
        filters: dict[str, Any] | None = None,
    ) -> dict[str, Any]:
        """Execute semantic search against Chroma collection."""
        # Embed query for similarity search
        query_embedding = EmbedderHelper.embed_query(self.embedder, query)

        self.db._get_collection(self.collection_name)
        results_dict = self.db.query(
            query_embedding=query_embedding,
            n_results=top_k,
            where=filters,
        )
        documents = (
            ChromaDocumentConverter.convert_query_results_to_langchain_documents(
                results_dict
            )
        )

        result = {
            "documents": documents,
            "query": query,
        }

        # Generate RAG answer if LLM is configured
        if self.llm is not None:
            answer = RAGHelper.generate(self.llm, query, documents)
            result["answer"] = answer

        return result

Configuration

chroma:
  path: "./chroma_data"  # Directory for local storage
  collection_name: "documents"  # Collection name
  recreate: false  # Whether to recreate collection

embeddings:
  model: "sentence-transformers/all-MiniLM-L6-v2"
  device: "cpu"  # or "cuda" for GPU
  batch_size: 32

dataloader:
  dataset: "triviaqa"
  split: "test"
  limit: 500

search:
  top_k: 10

rag:
  enabled: false

Embedding helper

The EmbedderHelper provides static methods for creating and using HuggingFace embedding models:
src/vectordb/langchain/utils/embeddings.py
from langchain_huggingface import HuggingFaceEmbeddings

class EmbedderHelper:
    """Helper class for HuggingFace embedding model operations."""

    @classmethod
    def create_embedder(cls, config: dict[str, Any]) -> HuggingFaceEmbeddings:
        """Create HuggingFaceEmbeddings from config."""
        embeddings_config = config.get("embeddings", {})
        model = embeddings_config.get("model", "sentence-transformers/all-MiniLM-L6-v2")
        device = embeddings_config.get("device", "cpu")
        batch_size = embeddings_config.get("batch_size", 32)

        return HuggingFaceEmbeddings(
            model_name=model,
            model_kwargs={"device": device},
            encode_kwargs={"batch_size": batch_size},
        )

    @classmethod
    def embed_documents(
        cls, embedder: HuggingFaceEmbeddings, documents: list[Document]
    ) -> tuple[list[Document], list[list[float]]]:
        """Embed documents and return with embeddings."""
        texts = [doc.page_content for doc in documents]
        embeddings = embedder.embed_documents(texts)
        return documents, embeddings

    @classmethod
    def embed_query(cls, embedder: HuggingFaceEmbeddings, query: str) -> list[float]:
        """Embed a single query."""
        return embedder.embed_query(query)

RAG helper

The RAGHelper creates LLMs and formats prompts for answer generation:
src/vectordb/langchain/utils/rag.py
from langchain_groq import ChatGroq

class RAGHelper:
    """Helper for RAG-related operations."""

    DEFAULT_PROMPT_TEMPLATE = """{context}

Question: {query}

Answer:"""

    @classmethod
    def create_llm(cls, config: dict[str, Any]) -> ChatGroq | None:
        """Create ChatGroq LLM from config."""
        rag_config = config.get("rag", {})
        if not rag_config.get("enabled", False):
            return None

        model = rag_config.get("model", "llama-3.3-70b-versatile")
        api_key = rag_config.get("api_key") or os.environ.get("GROQ_API_KEY")
        temperature = rag_config.get("temperature", 0.7)
        max_tokens = rag_config.get("max_tokens", 2048)

        return ChatGroq(
            model=model,
            api_key=api_key,
            temperature=temperature,
            max_tokens=max_tokens,
        )

    @classmethod
    def generate(
        cls,
        llm: ChatGroq,
        query: str,
        documents: list[Document],
        template: str | None = None,
    ) -> str:
        """Generate RAG answer using LLM."""
        prompt = cls.format_prompt(query, documents, template)
        response = llm.invoke(prompt)
        return response.content

When to use it

  • Natural-language questions where query phrasing differs from document vocabulary
  • General-purpose RAG baseline before specializing with advanced features
  • Any corpus where exact keyword overlap between query and documents is unreliable

When not to use it

  • Strict compliance or legal workflows where exact terms must appear verbatim
  • Very small corpora where BM25 already saturates quality
  • Keyword-heavy technical workloads where semantic generalization is unhelpful

Tradeoffs

DimensionWhat to expect
QualityStrong semantic recall; may miss exact terminology
LatencyLow to moderate; dominated by embedding inference
CostEmbedding compute + vector search cost per query

Settings to tune first

embeddings.model
string
The primary quality lever; the model determines how semantically meaningful similarity scores are.Common choices:
  • sentence-transformers/all-MiniLM-L6-v2: Fast, 384-dimensional
  • sentence-transformers/all-mpnet-base-v2: Higher quality, 768-dimensional
  • BAAI/bge-small-en-v1.5: Strong retrieval performance
search.top_k
integer
default:"10"
Controls the number of returned candidates; too small misses evidence, too large increases downstream cost.
dataloader.limit
integer
Corpus size for experiments; start small to validate pipeline, then scale up.

Common pitfalls

Mismatched embedding models: Using a different model for indexing and querying produces meaningless similarity scores.
Oversized chunks: Large text chunks blur the embedding signal. Shorter, focused chunks typically produce better retrieval.
Too small top_k: If relevant evidence is rarely in the top 3 results, increase top_k and apply reranking rather than only tuning the embedding model.

Backends supported

Chroma, Milvus, Pinecone, Qdrant, Weaviate.

Next steps

Add reranking

Add two-stage retrieval for better final-result precision

Hybrid search

Switch to hybrid indexing if queries mix natural language with domain keywords

Metadata filtering

Add metadata filtering if the corpus has reliable structured attributes

Components

Explore reusable components for query enhancement and compression

Build docs developers (and LLMs) love