Skip to main content
Core database wrapper classes that provide unified interfaces to vector databases. These wrappers are used by both Haystack and LangChain integrations.

ChromaVectorDB

Comprehensive Chroma Vector Database interface for Haystack integration with support for cloud deployments, local persistent storage, and ephemeral in-memory storage.

Constructor

ChromaVectorDB(
    host: Optional[str] = None,
    port: Optional[int] = None,
    api_key: Optional[str] = None,
    tenant: Optional[str] = None,
    database: Optional[str] = None,
    path: Optional[str] = None,
    persistent: bool = True,
    collection_name: Optional[str] = None,
    config: Optional[Dict[str, Any]] = None,
    config_path: Optional[str] = None,
    **kwargs: Any
)
host
str
Chroma server hostname for HttpClient connections
port
int
Chroma server port number (default: 8000)
api_key
str
API key for authenticated Chroma instances
tenant
str
Tenant name for multi-tenant mode (default: “default_tenant”)
database
str
Database name within tenant (default: “default_database”)
path
str
Local storage path for PersistentClient (default: ”./chroma”)
persistent
bool
default:"True"
Use persistent storage when no host provided
collection_name
str
Default collection name for operations
config
Dict[str, Any]
Direct configuration dictionary to load settings from
config_path
str
Path to YAML configuration file
**kwargs
Any
Additional parameters including ssl, tracing_project_name

Methods

create_collection

Create a new collection in the Chroma database.
create_collection(
    name: str,
    configuration: Optional[CollectionConfiguration] = None,
    metadata: Optional[CollectionMetadata] = None,
    embedding_function: Optional[EmbeddingFunction[Embeddable]] = None,
    get_or_create: bool = True,
    **kwargs: Any
) -> None
name
str
required
Unique identifier for the collection
configuration
CollectionConfiguration
Collection-specific configuration object
metadata
CollectionMetadata
Metadata to associate with the collection
embedding_function
EmbeddingFunction
Function to generate embeddings. Uses DefaultEmbeddingFunction if not provided
get_or_create
bool
default:"True"
If True, retrieves existing collection if it exists. If False, raises error if collection exists

upsert

Insert or update documents in the current collection.
upsert(
    data: Union[List[Any], Dict[str, Any]],
    **kwargs: Any
) -> None
data
Union[List[Any], Dict[str, Any]]
required
Either a list of Haystack Documents or a dictionary with keys: ids, documents/texts, metadatas, embeddings

query

Query the collection for similar vectors.
query(
    query_embedding: Optional[list[float]] = None,
    query_text: Optional[str] = None,
    n_results: int = 10,
    where: Optional[dict[str, Any]] = None,
    where_document: Optional[dict[str, Any]] = None,
    include: List[str] = None,
    include_vectors: bool = False,
    **kwargs: Any
) -> dict[str, Any]
query_embedding
list[float]
Vector embedding to search for
query_text
str
Raw text to embed and search for
n_results
int
default:"10"
Maximum number of results to return
where
dict[str, Any]
Metadata filter conditions (Chroma filter syntax)
where_document
dict[str, Any]
Document content filter conditions
include
List[str]
List of fields to include in results. Defaults to [“metadatas”, “documents”, “distances”]
include_vectors
bool
default:"False"
Whether to include embeddings in results
ids
List[str]
List of document IDs
documents
List[str]
List of document contents
metadatas
List[dict]
List of metadata dictionaries
distances
List[float]
List of distance scores
embeddings
List[List[float]]
List of vectors (if include_vectors=True)

delete_collection

Delete a collection from the Chroma database.
delete_collection(name: Optional[str] = None) -> None
name
str
Name of collection to delete. Uses default collection_name if not provided

flatten_metadata

Recursively flatten nested metadata for Chroma compatibility.
flatten_metadata(metadata: Dict[str, Any]) -> Dict[str, Any]
metadata
Dict[str, Any]
required
Dictionary potentially containing nested dictionaries, lists, or complex types
flattened
Dict[str, Any]
Flattened dictionary compatible with Chroma metadata storage. Nested keys use dot notation (e.g., “parent.child.key”)

MilvusVectorDB

Interface for interacting with Milvus/Zilliz vector databases with support for dense vectors, sparse vectors, and hybrid search.

Constructor

MilvusVectorDB(
    uri: str = "http://localhost:19530",
    token: str = "",
    host: Optional[str] = None,
    port: Optional[str] = None,
    collection_name: Optional[str] = None,
    **kwargs
)
uri
str
default:"http://localhost:19530"
Milvus server URI. Use “http://localhost:19530” for local Milvus or “https://…” for Zilliz Cloud endpoints
token
str
default:""
API token for Zilliz Cloud authentication. Leave empty for local Milvus or when using no authentication
host
str
Deprecated. Use uri instead. Maintained for backward compatibility
port
str
Deprecated. Use uri instead. Maintained for backward compatibility
collection_name
str
Default collection name for operations. Can be overridden per-method

Methods

create_collection

Create a Milvus collection with comprehensive schema for document storage.
create_collection(
    collection_name: str,
    dimension: int,
    description: str = "",
    use_sparse: bool = False,
    use_partition_key: bool = False,
    partition_key_field: str = "namespace",
    recreate: bool = False
)
collection_name
str
required
Unique name for the collection
dimension
int
required
Dimensionality of dense embedding vectors. Must match the embedding model used (e.g., 768 for most transformer models)
description
str
default:""
Human-readable description of the collection’s purpose
use_sparse
bool
default:"False"
Whether to include a sparse vector field for hybrid search. Adds storage overhead but enables keyword matching alongside semantic search
use_partition_key
bool
default:"False"
Whether to enable physical data partitioning. Enables efficient multi-tenancy by isolating data at the partition level
partition_key_field
str
default:"namespace"
Name of the partition key field. Documents with the same partition key value are routed to the same physical partition
recreate
bool
default:"False"
If True, drops existing collection with the same name before creating new one. Use with caution in production
Perform semantic search with support for dense, sparse, or hybrid retrieval.
search(
    query_embedding: Optional[List[float]] = None,
    query_sparse_embedding: Optional[Union[Dict[int, float], SparseEmbedding]] = None,
    top_k: int = 10,
    collection_name: Optional[str] = None,
    filters: Optional[Dict[str, Any]] = None,
    scope: Optional[str] = None,
    namespace: Optional[str] = None,
    partition_key_field: str = "namespace",
    ranker_type: str = "rrf",
    weights: Optional[List[float]] = None,
    include_vectors: bool = False
) -> List[Document]
query_embedding
List[float]
Dense query vector for semantic search. Typically from a text embedding model (e.g., 768-dim)
query_sparse_embedding
Union[Dict[int, float], SparseEmbedding]
Sparse query vector for keyword search. Can be Dict[int, float] mapping term IDs to weights, or Haystack SparseEmbedding
top_k
int
default:"10"
Maximum number of results to return
collection_name
str
Target collection. Uses default from constructor if None
filters
Dict[str, Any]
Metadata filter conditions as nested dict. Supports operators: eq(equality),eq (equality), gt/lt(range),lt (range), in (list membership), $contains (JSON contains)
scope
str
Unified tenant/partition identifier. Preferred over ‘namespace’ as it clearly conveys the isolation concept
namespace
str
Legacy alias for scope. Partition key value for data isolation. Only used if scope is not provided
partition_key_field
str
default:"namespace"
Name of the partition key field in schema. Used to construct partition filter expressions
ranker_type
str
default:"rrf"
Reranking strategy for hybrid search. “rrf” uses Reciprocal Rank Fusion (k=60 constant). “weighted” uses explicit weights
weights
List[float]
Two-element list [dense_weight, sparse_weight] for weighted ranker. Only used when ranker_type=“weighted”. Defaults to [0.5, 0.5]
include_vectors
bool
default:"False"
If True, includes embedding vectors in returned Documents. Increases response size but useful for downstream processing
documents
List[Document]
List of Haystack Document objects ordered by relevance score (descending). Documents include content, metadata, score, and optionally embeddings

PineconeVectorDB

Production-ready interface for Pinecone vector database operations with support for hybrid search and multi-tenancy.

Constructor

PineconeVectorDB(
    api_key: Optional[str] = None,
    index_name: Optional[str] = None,
    config: Optional[Dict[str, Any]] = None,
    config_path: Optional[str] = None,
    **kwargs
)
api_key
str
Pinecone API key for authentication
index_name
str
Name of the Pinecone index to operate on
config
Dict[str, Any]
Direct configuration dictionary
config_path
str
Path to YAML configuration file
**kwargs
Any
Additional connection parameters (host, proxy_url, ssl_verify, pool_threads)

Methods

create_index

Create a new Pinecone index or ensure an existing one is ready.
create_index(
    dimension: Optional[int] = None,
    metric: str = "cosine",
    spec: Optional[Union[Dict[str, Any], ServerlessSpec]] = None,
    recreate: bool = False,
    index_name: Optional[str] = None,
    **kwargs
) -> None
dimension
int
required
Dimensionality of the vectors. Required for new indexes
metric
str
default:"cosine"
Distance metric for similarity calculations. Options: “cosine”, “euclidean”, “dotproduct”
spec
Union[Dict[str, Any], ServerlessSpec]
ServerlessSpec or dict defining cloud provider and region. Defaults to AWS us-east-1 serverless if not provided
recreate
bool
default:"False"
If True, delete existing index before creating new one
index_name
str
Override the index name from init for this operation

query

Query the index for similar vectors.
query(
    vector: List[float],
    top_k: int = 10,
    filter: Optional[Dict[str, Any]] = None,
    namespace: str = "",
    scope: Optional[str] = None,
    include_metadata: bool = True,
    include_values: bool = False,
    include_vectors: bool = False
) -> List[Any]
vector
List[float]
required
Query dense embedding vector
top_k
int
default:"10"
Number of results to return
filter
Dict[str, Any]
Metadata filters to apply
namespace
str
default:""
Namespace to search in (legacy parameter)
scope
str
Unified scope/namespace parameter (preferred over namespace)
include_metadata
bool
default:"True"
Whether to include metadata in results
include_values
bool
default:"False"
Whether to include vector values in results (legacy parameter)
include_vectors
bool
default:"False"
Whether to include vector values in results (preferred parameter)
results
List[Document]
List of Haystack Documents with content, metadata, and optional embeddings

flatten_metadata

Transform nested metadata into Pinecone-compatible flat structure.
flatten_metadata(metadata: Dict[str, Any]) -> Dict[str, Any]
metadata
Dict[str, Any]
required
Potentially nested dictionary with arbitrary values
flattened
Dict[str, Any]
Flat dictionary with only Pinecone-supported types. Nested dictionaries use underscore notation (e.g., user.id becomes user_id)

Build docs developers (and LLMs) love