Database wrappers

Core database wrapper classes that provide unified interfaces to vector databases. These wrappers are used by both Haystack and LangChain integrations.

ChromaVectorDB

Comprehensive Chroma Vector Database interface for Haystack integration with support for cloud deployments, local persistent storage, and ephemeral in-memory storage.

Constructor

ChromaVectorDB(
    host: Optional[str] = None,
    port: Optional[int] = None,
    api_key: Optional[str] = None,
    tenant: Optional[str] = None,
    database: Optional[str] = None,
    path: Optional[str] = None,
    persistent: bool = True,
    collection_name: Optional[str] = None,
    config: Optional[Dict[str, Any]] = None,
    config_path: Optional[str] = None,
    **kwargs: Any
)

host

str

Chroma server hostname for HttpClient connections

port

int

Chroma server port number (default: 8000)

api_key

str

API key for authenticated Chroma instances

tenant

str

Tenant name for multi-tenant mode (default: “default_tenant”)

database

str

Database name within tenant (default: “default_database”)

path

str

Local storage path for PersistentClient (default: ”./chroma”)

persistent

bool

default:"True"

Use persistent storage when no host provided

collection_name

str

Default collection name for operations

config

Dict[str, Any]

Direct configuration dictionary to load settings from

config_path

str

Path to YAML configuration file

**kwargs

Any

Additional parameters including ssl, tracing_project_name

Methods

create_collection

Create a new collection in the Chroma database.

create_collection(
    name: str,
    configuration: Optional[CollectionConfiguration] = None,
    metadata: Optional[CollectionMetadata] = None,
    embedding_function: Optional[EmbeddingFunction[Embeddable]] = None,
    get_or_create: bool = True,
    **kwargs: Any
) -> None

name

str

required

Unique identifier for the collection

configuration

CollectionConfiguration

Collection-specific configuration object

metadata

CollectionMetadata

Metadata to associate with the collection

embedding_function

EmbeddingFunction

Function to generate embeddings. Uses DefaultEmbeddingFunction if not provided

get_or_create

bool

default:"True"

If True, retrieves existing collection if it exists. If False, raises error if collection exists

upsert

Insert or update documents in the current collection.

upsert(
    data: Union[List[Any], Dict[str, Any]],
    **kwargs: Any
) -> None

data

Union[List[Any], Dict[str, Any]]

required

Either a list of Haystack Documents or a dictionary with keys: ids, documents/texts, metadatas, embeddings

query

Query the collection for similar vectors.

query(
    query_embedding: Optional[list[float]] = None,
    query_text: Optional[str] = None,
    n_results: int = 10,
    where: Optional[dict[str, Any]] = None,
    where_document: Optional[dict[str, Any]] = None,
    include: List[str] = None,
    include_vectors: bool = False,
    **kwargs: Any
) -> dict[str, Any]

query_embedding

list[float]

Vector embedding to search for

query_text

str

Raw text to embed and search for

n_results

int

default:"10"

Maximum number of results to return

where

dict[str, Any]

Metadata filter conditions (Chroma filter syntax)

where_document

dict[str, Any]

Document content filter conditions

include

List[str]

List of fields to include in results. Defaults to [“metadatas”, “documents”, “distances”]

include_vectors

bool

default:"False"

Whether to include embeddings in results

ids

List[str]

List of document IDs

documents

List[str]

List of document contents

metadatas

List[dict]

List of metadata dictionaries

distances

List[float]

List of distance scores

embeddings

List[List[float]]

List of vectors (if include_vectors=True)

delete_collection

Delete a collection from the Chroma database.

delete_collection(name: Optional[str] = None) -> None

name

str

Name of collection to delete. Uses default collection_name if not provided

flatten_metadata

Recursively flatten nested metadata for Chroma compatibility.

flatten_metadata(metadata: Dict[str, Any]) -> Dict[str, Any]

metadata

Dict[str, Any]

required

Dictionary potentially containing nested dictionaries, lists, or complex types

flattened

Dict[str, Any]

Flattened dictionary compatible with Chroma metadata storage. Nested keys use dot notation (e.g., “parent.child.key”)

MilvusVectorDB

Interface for interacting with Milvus/Zilliz vector databases with support for dense vectors, sparse vectors, and hybrid search.

Constructor

MilvusVectorDB(
    uri: str = "http://localhost:19530",
    token: str = "",
    host: Optional[str] = None,
    port: Optional[str] = None,
    collection_name: Optional[str] = None,
    **kwargs
)

uri

str

default:"http://localhost:19530"

Milvus server URI. Use “http://localhost:19530” for local Milvus or “https://…” for Zilliz Cloud endpoints

token

str

default:""

API token for Zilliz Cloud authentication. Leave empty for local Milvus or when using no authentication

host

str

Deprecated. Use uri instead. Maintained for backward compatibility

port

str

Deprecated. Use uri instead. Maintained for backward compatibility

collection_name

str

Default collection name for operations. Can be overridden per-method

Methods

create_collection

Create a Milvus collection with comprehensive schema for document storage.

create_collection(
    collection_name: str,
    dimension: int,
    description: str = "",
    use_sparse: bool = False,
    use_partition_key: bool = False,
    partition_key_field: str = "namespace",
    recreate: bool = False
)

collection_name

str

required

Unique name for the collection

dimension

int

required

Dimensionality of dense embedding vectors. Must match the embedding model used (e.g., 768 for most transformer models)

description

str

default:""

Human-readable description of the collection’s purpose

use_sparse

bool

default:"False"

Whether to include a sparse vector field for hybrid search. Adds storage overhead but enables keyword matching alongside semantic search

use_partition_key

bool

default:"False"

Whether to enable physical data partitioning. Enables efficient multi-tenancy by isolating data at the partition level

partition_key_field

str

default:"namespace"

Name of the partition key field. Documents with the same partition key value are routed to the same physical partition

recreate

bool

default:"False"

If True, drops existing collection with the same name before creating new one. Use with caution in production

search

Perform semantic search with support for dense, sparse, or hybrid retrieval.

search(
    query_embedding: Optional[List[float]] = None,
    query_sparse_embedding: Optional[Union[Dict[int, float], SparseEmbedding]] = None,
    top_k: int = 10,
    collection_name: Optional[str] = None,
    filters: Optional[Dict[str, Any]] = None,
    scope: Optional[str] = None,
    namespace: Optional[str] = None,
    partition_key_field: str = "namespace",
    ranker_type: str = "rrf",
    weights: Optional[List[float]] = None,
    include_vectors: bool = False
) -> List[Document]

query_embedding

List[float]

Dense query vector for semantic search. Typically from a text embedding model (e.g., 768-dim)

query_sparse_embedding

Union[Dict[int, float], SparseEmbedding]

Sparse query vector for keyword search. Can be Dict[int, float] mapping term IDs to weights, or Haystack SparseEmbedding

top_k

int

default:"10"

Maximum number of results to return

collection_name

str

Target collection. Uses default from constructor if None

filters

Dict[str, Any]

Metadata filter conditions as nested dict. Supports operators:

eq (equality),

gt/

lt (range),

in (list membership), $contains (JSON contains)

scope

str

Unified tenant/partition identifier. Preferred over ‘namespace’ as it clearly conveys the isolation concept

namespace

str

Legacy alias for scope. Partition key value for data isolation. Only used if scope is not provided

partition_key_field

str

default:"namespace"

Name of the partition key field in schema. Used to construct partition filter expressions

ranker_type

str

default:"rrf"

Reranking strategy for hybrid search. “rrf” uses Reciprocal Rank Fusion (k=60 constant). “weighted” uses explicit weights

weights

List[float]

Two-element list [dense_weight, sparse_weight] for weighted ranker. Only used when ranker_type=“weighted”. Defaults to [0.5, 0.5]

include_vectors

bool

default:"False"

If True, includes embedding vectors in returned Documents. Increases response size but useful for downstream processing

documents

List[Document]

List of Haystack Document objects ordered by relevance score (descending). Documents include content, metadata, score, and optionally embeddings

PineconeVectorDB

Production-ready interface for Pinecone vector database operations with support for hybrid search and multi-tenancy.

Constructor

PineconeVectorDB(
    api_key: Optional[str] = None,
    index_name: Optional[str] = None,
    config: Optional[Dict[str, Any]] = None,
    config_path: Optional[str] = None,
    **kwargs
)

api_key

str

Pinecone API key for authentication

index_name

str

Name of the Pinecone index to operate on

config

Dict[str, Any]

Direct configuration dictionary

config_path

str

Path to YAML configuration file

**kwargs

Any

Additional connection parameters (host, proxy_url, ssl_verify, pool_threads)

Methods

create_index

Create a new Pinecone index or ensure an existing one is ready.

create_index(
    dimension: Optional[int] = None,
    metric: str = "cosine",
    spec: Optional[Union[Dict[str, Any], ServerlessSpec]] = None,
    recreate: bool = False,
    index_name: Optional[str] = None,
    **kwargs
) -> None

dimension

int

required

Dimensionality of the vectors. Required for new indexes

metric

str

default:"cosine"

Distance metric for similarity calculations. Options: “cosine”, “euclidean”, “dotproduct”

spec

Union[Dict[str, Any], ServerlessSpec]

ServerlessSpec or dict defining cloud provider and region. Defaults to AWS us-east-1 serverless if not provided

recreate

bool

default:"False"

If True, delete existing index before creating new one

index_name

str

Override the index name from init for this operation

query

Query the index for similar vectors.

query(
    vector: List[float],
    top_k: int = 10,
    filter: Optional[Dict[str, Any]] = None,
    namespace: str = "",
    scope: Optional[str] = None,
    include_metadata: bool = True,
    include_values: bool = False,
    include_vectors: bool = False
) -> List[Any]

vector

List[float]

required

Query dense embedding vector

top_k

int

default:"10"

Number of results to return

filter

Dict[str, Any]

Metadata filters to apply

namespace

str

default:""

Namespace to search in (legacy parameter)

scope

str

Unified scope/namespace parameter (preferred over namespace)

include_metadata

bool

default:"True"

Whether to include metadata in results

include_values

bool

default:"False"

Whether to include vector values in results (legacy parameter)

include_vectors

bool

default:"False"

Whether to include vector values in results (preferred parameter)

results

List[Document]

List of Haystack Documents with content, metadata, and optional embeddings

flatten_metadata

Transform nested metadata into Pinecone-compatible flat structure.

flatten_metadata(metadata: Dict[str, Any]) -> Dict[str, Any]

metadata

Dict[str, Any]

required

Potentially nested dictionary with arbitrary values

flattened

Dict[str, Any]

Flat dictionary with only Pinecone-supported types. Nested dictionaries use underscore notation (e.g., user.id becomes user_id)

Core API

Haystack API

LangChain API

Database wrappers

ChromaVectorDB

Constructor

Methods

create_collection

upsert

query

delete_collection

flatten_metadata

MilvusVectorDB

Constructor

Methods

create_collection

search

PineconeVectorDB

Constructor

Methods

create_index

query

flatten_metadata

Build docs developers (and LLMs) love

Core API

Haystack API

LangChain API

​ChromaVectorDB

​Constructor

​Methods

​create_collection

​upsert

​query

​delete_collection

​flatten_metadata

​MilvusVectorDB

​Constructor

​Methods

​create_collection

​search

​PineconeVectorDB

​Constructor

​Methods

​create_index

​query

​flatten_metadata

Build docs developers (and LLMs) love

ChromaVectorDB

Constructor

Methods

create_collection

upsert

query

delete_collection

flatten_metadata

MilvusVectorDB

Constructor

Methods

create_collection

search

PineconeVectorDB

Constructor

Methods

create_index

query

flatten_metadata