ChromaVectorDB
Comprehensive Chroma Vector Database interface for Haystack integration with support for cloud deployments, local persistent storage, and ephemeral in-memory storage.Constructor
Chroma server hostname for HttpClient connections
Chroma server port number (default: 8000)
API key for authenticated Chroma instances
Tenant name for multi-tenant mode (default: “default_tenant”)
Database name within tenant (default: “default_database”)
Local storage path for PersistentClient (default: ”./chroma”)
Use persistent storage when no host provided
Default collection name for operations
Direct configuration dictionary to load settings from
Path to YAML configuration file
Additional parameters including ssl, tracing_project_name
Methods
create_collection
Create a new collection in the Chroma database.Unique identifier for the collection
Collection-specific configuration object
Metadata to associate with the collection
Function to generate embeddings. Uses DefaultEmbeddingFunction if not provided
If True, retrieves existing collection if it exists. If False, raises error if collection exists
upsert
Insert or update documents in the current collection.Either a list of Haystack Documents or a dictionary with keys: ids, documents/texts, metadatas, embeddings
query
Query the collection for similar vectors.Vector embedding to search for
Raw text to embed and search for
Maximum number of results to return
Metadata filter conditions (Chroma filter syntax)
Document content filter conditions
List of fields to include in results. Defaults to [“metadatas”, “documents”, “distances”]
Whether to include embeddings in results
List of document IDs
List of document contents
List of metadata dictionaries
List of distance scores
List of vectors (if include_vectors=True)
delete_collection
Delete a collection from the Chroma database.Name of collection to delete. Uses default collection_name if not provided
flatten_metadata
Recursively flatten nested metadata for Chroma compatibility.Dictionary potentially containing nested dictionaries, lists, or complex types
Flattened dictionary compatible with Chroma metadata storage. Nested keys use dot notation (e.g., “parent.child.key”)
MilvusVectorDB
Interface for interacting with Milvus/Zilliz vector databases with support for dense vectors, sparse vectors, and hybrid search.Constructor
Milvus server URI. Use “http://localhost:19530” for local Milvus or “https://…” for Zilliz Cloud endpoints
API token for Zilliz Cloud authentication. Leave empty for local Milvus or when using no authentication
Deprecated. Use uri instead. Maintained for backward compatibility
Deprecated. Use uri instead. Maintained for backward compatibility
Default collection name for operations. Can be overridden per-method
Methods
create_collection
Create a Milvus collection with comprehensive schema for document storage.Unique name for the collection
Dimensionality of dense embedding vectors. Must match the embedding model used (e.g., 768 for most transformer models)
Human-readable description of the collection’s purpose
Whether to include a sparse vector field for hybrid search. Adds storage overhead but enables keyword matching alongside semantic search
Whether to enable physical data partitioning. Enables efficient multi-tenancy by isolating data at the partition level
Name of the partition key field. Documents with the same partition key value are routed to the same physical partition
If True, drops existing collection with the same name before creating new one. Use with caution in production
search
Perform semantic search with support for dense, sparse, or hybrid retrieval.Dense query vector for semantic search. Typically from a text embedding model (e.g., 768-dim)
Sparse query vector for keyword search. Can be Dict[int, float] mapping term IDs to weights, or Haystack SparseEmbedding
Maximum number of results to return
Target collection. Uses default from constructor if None
Metadata filter conditions as nested dict. Supports operators: gt/in (list membership), $contains (JSON contains)
Unified tenant/partition identifier. Preferred over ‘namespace’ as it clearly conveys the isolation concept
Legacy alias for scope. Partition key value for data isolation. Only used if scope is not provided
Name of the partition key field in schema. Used to construct partition filter expressions
Reranking strategy for hybrid search. “rrf” uses Reciprocal Rank Fusion (k=60 constant). “weighted” uses explicit weights
Two-element list [dense_weight, sparse_weight] for weighted ranker. Only used when ranker_type=“weighted”. Defaults to [0.5, 0.5]
If True, includes embedding vectors in returned Documents. Increases response size but useful for downstream processing
List of Haystack Document objects ordered by relevance score (descending). Documents include content, metadata, score, and optionally embeddings
PineconeVectorDB
Production-ready interface for Pinecone vector database operations with support for hybrid search and multi-tenancy.Constructor
Pinecone API key for authentication
Name of the Pinecone index to operate on
Direct configuration dictionary
Path to YAML configuration file
Additional connection parameters (host, proxy_url, ssl_verify, pool_threads)
Methods
create_index
Create a new Pinecone index or ensure an existing one is ready.Dimensionality of the vectors. Required for new indexes
Distance metric for similarity calculations. Options: “cosine”, “euclidean”, “dotproduct”
ServerlessSpec or dict defining cloud provider and region. Defaults to AWS us-east-1 serverless if not provided
If True, delete existing index before creating new one
Override the index name from init for this operation
query
Query the index for similar vectors.Query dense embedding vector
Number of results to return
Metadata filters to apply
Namespace to search in (legacy parameter)
Unified scope/namespace parameter (preferred over namespace)
Whether to include metadata in results
Whether to include vector values in results (legacy parameter)
Whether to include vector values in results (preferred parameter)
List of Haystack Documents with content, metadata, and optional embeddings
flatten_metadata
Transform nested metadata into Pinecone-compatible flat structure.Potentially nested dictionary with arbitrary values
Flat dictionary with only Pinecone-supported types. Nested dictionaries use underscore notation (e.g., user.id becomes user_id)