Skip to main content
This module provides Haystack-based retrieval and RAG pipeline implementations across all five supported vector database backends. Every feature is organized as a self-contained directory with configuration files, indexing scripts, and search scripts for each backend.

What you get

  • Seventeen retrieval and RAG patterns implemented using Haystack’s pipeline and component abstractions
  • Full portability across Pinecone, Weaviate, Chroma, Milvus, and Qdrant (with feature-specific notes on backend support)
  • YAML-driven configuration with environment variable substitution so credentials stay out of code
  • Evaluation support via shared utils/evaluation.py metrics
  • Shared reusable components and helper factories that all feature pipelines draw from

Module structure

Each feature directory follows the same layout:
feature_name/
├── configs/
│   ├── chroma_triviaqa.yaml
│   ├── milvus_triviaqa.yaml
│   ├── pinecone_triviaqa.yaml
│   ├── qdrant_triviaqa.yaml
│   ├── weaviate_triviaqa.yaml
│   └── (one config per backend × dataset combination)
├── indexing/
│   ├── chroma.py
│   ├── milvus.py
│   ├── pinecone.py
│   ├── qdrant.py
│   └── weaviate.py
├── search/
│   ├── chroma.py
│   ├── milvus.py
│   ├── pinecone.py
│   ├── qdrant.py
│   └── weaviate.py
└── README.md

Feature catalog

Semantic search

Dense vector similarity search - the baseline pattern

Hybrid search

Combines dense + sparse embeddings with RRF fusion

Components

Reusable pipeline components for routing, compression, and query enhancement

Pipelines

Pipeline architecture patterns and composition

Core retrieval patterns

FeatureDescriptionUse when
semantic_searchDense vector similarity searchStarting point and baseline
hybrid_indexingDense + sparse embeddings with RRFBoth semantic and keyword precision needed
sparse_indexingSparse-only retrievalPure keyword/lexical precision
rerankingTwo-stage retrieval with cross-encoderBetter final ranking needed
mmrMaximal Marginal RelevanceRelevant + diverse results
diversity_filteringSimilarity count filteringLess redundant context

Advanced retrieval

FeatureDescriptionUse when
metadata_filteringStructured constraintsNeed to filter by attributes
json_indexingJSON-native documentsStructured fields + semantic content
query_enhancementMulti-query, HyDE, step-backBetter query recall
contextual_compressionAbstractive, extractive, relevance filteringShorter, cleaner context
parent_document_retrievalIndex children, retrieve parentsLong docs with fragment search

Production features

FeatureDescriptionUse when
cost_optimized_ragToken/compute budget controlCost reduction needed
agentic_ragMulti-step iterative RAG with self-reflectionComplex multi-hop questions
multi_tenancyTenant-scoped isolationPer-customer data isolation
namespacesLogical segmentationEnvironment separation

Embedding configuration

All Haystack feature pipelines read embedding configuration from the YAML config:
embeddings:
  model: "sentence-transformers/all-MiniLM-L6-v2"  # Required
  device: "cpu"                                      # Optional
  batch_size: 32                                      # Optional
For hybrid and sparse features:
sparse:
  model: "naver/splade-cocondenser-ensembledistil"

RAG configuration

Generation is controlled by the rag section:
rag:
  enabled: true
  model: "llama-3.3-70b-versatile"
  api_key: "${GROQ_API_KEY}"
  api_base_url: "https://api.groq.com/openai/v1"
  temperature: 0.7
  max_tokens: 2048
Set enabled: false to run retrieval-only pipelines without generation.
1

Run semantic search baseline

Start with semantic_search on your target backend with a small dataset limit (100-200 records) and verify the pipeline loads, indexes, and retrieves successfully.
2

Measure retrieval quality

Use evaluation_queries() and evaluate_retrieval() to establish baseline metrics.
3

Add improvements incrementally

Add one improvement feature at a time (e.g., reranking or hybrid_indexing) and measure whether quality improves on your evaluation set.
4

Add production features

Once the retrieval baseline is strong, adopt multi_tenancy or namespaces for data isolation, and cost_optimized_rag for budget controls.
5

Handle complex queries

Use agentic_rag or query_enhancement for hard multi-hop questions where single-pass retrieval falls short.

Supported backends

  • Chroma: Local embedded vector database with SQLite persistence
  • Milvus: High-performance distributed vector database
  • Pinecone: Managed vector database with native sparse vector support
  • Qdrant: Vector database with advanced filtering and multitenancy
  • Weaviate: GraphQL-based vector database with semantic search
Each backend has consistent API patterns through Haystack’s abstraction layer.

Next steps

Semantic search

Start with the baseline semantic search pattern

Hybrid search

Learn about dense + sparse hybrid retrieval

Components

Explore reusable pipeline components

Pipelines

Understand pipeline architecture patterns

Build docs developers (and LLMs) love