Skip to main content
Haystack integration provides pipeline components for building RAG applications with vector databases.

Pipeline Types

VectorDB provides several pre-built pipeline types for Haystack:
  • Semantic Search: Dense vector retrieval using embedding models
  • Hybrid Indexing: Combined dense and sparse vector indexing
  • Sparse Indexing: BM25-style keyword-based retrieval
  • MMR (Maximal Marginal Relevance): Diversity-optimized retrieval
  • Parent Document Retrieval: Hierarchical chunking with parent-child relationships
  • Query Enhancement: Multi-query and query expansion techniques
  • Reranking: Cross-encoder reranking of retrieved results
  • Contextual Compression: Token optimization through context compression
  • Agentic RAG: Self-reflective retrieval with routing
  • Multi-tenancy: Namespace-based data isolation
  • Metadata Filtering: Advanced filtering on document metadata
  • JSON Indexing: Indexing and filtering on nested JSON fields
  • Cost-Optimized RAG: Token-efficient retrieval strategies

Supported Vector Databases

All Haystack pipelines support these vector databases:
  • Chroma: ChromaSemanticSearchPipeline, ChromaMmrSearchPipeline, etc.
  • Milvus: MilvusSemanticSearchPipeline, MilvusHybridSearchPipeline, etc.
  • Pinecone: PineconeSemanticSearchPipeline, PineconeHybridSearchPipeline, etc.
  • Qdrant: QdrantSemanticSearchPipeline, QdrantMmrSearchPipeline, etc.
  • Weaviate: WeaviateSemanticSearchPipeline, WeaviateHybridSearchPipeline, etc.

Common Pipeline Methods

All Haystack pipelines share common initialization patterns and methods:

Constructor Pattern

Pipeline(
    config_path: str,
    collection_name: Optional[str] = None,
    embedding_model: Optional[str] = None,
    **kwargs
)
config_path
str
required
Path to YAML configuration file containing database credentials and settings
collection_name
str
Override collection name from config
embedding_model
str
Override embedding model from config (e.g., “sentence-transformers/all-MiniLM-L6-v2”)
**kwargs
Any
Additional pipeline-specific parameters
Perform retrieval search.
search(
    query: str,
    top_k: int = 10,
    filters: Optional[Dict[str, Any]] = None,
    **kwargs
) -> List[Document]
query
str
required
Query text to search for
top_k
int
default:"10"
Number of results to return
filters
Dict[str, Any]
Metadata filters to apply
**kwargs
Any
Pipeline-specific search parameters
documents
List[Document]
Retrieved Haystack Document objects ordered by relevance

index

Index documents into the vector database.
index(
    documents: List[Document],
    namespace: Optional[str] = None,
    **kwargs
) -> None
documents
List[Document]
required
Haystack Document objects to index
namespace
str
Namespace for multi-tenant isolation
**kwargs
Any
Pipeline-specific indexing parameters

Example Usage

from vectordb.haystack.semantic_search import ChromaSemanticSearchPipeline

# Initialize pipeline
pipeline = ChromaSemanticSearchPipeline(
    config_path="config.yaml",
    collection_name="my_docs",
    embedding_model="sentence-transformers/all-MiniLM-L6-v2"
)

# Index documents
from haystack import Document

documents = [
    Document(content="Machine learning is a subset of AI"),
    Document(content="Deep learning uses neural networks")
]
pipeline.index(documents)

# Search
results = pipeline.search(
    query="What is machine learning?",
    top_k=5
)
from vectordb.haystack.hybrid_indexing import MilvusHybridSearchPipeline

pipeline = MilvusHybridSearchPipeline(
    config_path="config.yaml",
    collection_name="hybrid_docs"
)

# Hybrid search combines dense and sparse vectors
results = pipeline.search(
    query="quantum computing applications",
    top_k=10,
    ranker_type="rrf"  # Reciprocal Rank Fusion
)

Multi-tenancy

from vectordb.haystack.multi_tenancy import PineconeMultiTenancyPipeline

pipeline = PineconeMultiTenancyPipeline(config_path="config.yaml")

# Index documents for tenant A
pipeline.index(documents, namespace="tenant_a")

# Search within tenant A only
results = pipeline.search(
    query="financial reports",
    namespace="tenant_a",
    top_k=5
)

Build docs developers (and LLMs) love