Skip to main content
Utility modules providing shared functionality across all vector database integrations.

Evaluation Metrics

compute_recall_at_k

Compute Recall@k for a single query.
compute_recall_at_k(
    retrieved_ids: list[str],
    relevant_ids: set[str],
    k: int
) -> float
retrieved_ids
list[str]
required
List of retrieved document IDs in ranked order
relevant_ids
set[str]
required
Set of ground truth relevant document IDs
k
int
required
Number of top results to consider
recall
float
Recall score between 0 and 1. Formula: (relevant docs in top-k) / (total relevant docs)

compute_precision_at_k

Compute Precision@k for a single query.
compute_precision_at_k(
    retrieved_ids: list[str],
    relevant_ids: set[str],
    k: int
) -> float
retrieved_ids
list[str]
required
List of retrieved document IDs in ranked order
relevant_ids
set[str]
required
Set of ground truth relevant document IDs
k
int
required
Number of top results to consider
precision
float
Precision score between 0 and 1. Formula: (relevant docs in top-k) / k

compute_mrr

Compute Mean Reciprocal Rank for a single query.
compute_mrr(
    retrieved_ids: list[str],
    relevant_ids: set[str]
) -> float
retrieved_ids
list[str]
required
List of retrieved document IDs in ranked order
relevant_ids
set[str]
required
Set of ground truth relevant document IDs
mrr
float
Reciprocal rank score between 0 and 1. Formula: 1 / (rank of first relevant document)

compute_ndcg_at_k

Compute Normalized Discounted Cumulative Gain at k.
compute_ndcg_at_k(
    retrieved_ids: list[str],
    relevant_ids: set[str],
    k: int
) -> float
retrieved_ids
list[str]
required
List of retrieved document IDs in ranked order
relevant_ids
set[str]
required
Set of ground truth relevant document IDs
k
int
required
Number of top results to consider
ndcg
float
NDCG score between 0 and 1. Formula: DCG@k / IDCG@k (Ideal DCG)

compute_hit_rate

Compute hit rate (binary success) for a single query.
compute_hit_rate(
    retrieved_ids: list[str],
    relevant_ids: set[str],
    k: int
) -> float
retrieved_ids
list[str]
required
List of retrieved document IDs in ranked order
relevant_ids
set[str]
required
Set of ground truth relevant document IDs
k
int
required
Number of top results to consider
hit_rate
float
1.0 if any relevant doc is in top-k, else 0.0

evaluate_retrieval

Evaluate retrieval quality over multiple queries.
evaluate_retrieval(
    query_results: list[QueryResult],
    k: int = 5
) -> RetrievalMetrics
query_results
list[QueryResult]
required
List of QueryResult objects with retrieved and relevant IDs
k
int
default:"5"
Cutoff for top-k metrics
metrics
RetrievalMetrics
RetrievalMetrics object with averaged scores across all queries

Sparse Embeddings

normalize_sparse

Normalize any sparse format to Haystack SparseEmbedding.
normalize_sparse(
    sparse: Union[SparseEmbedding, Dict[int, float], Dict[str, List], None]
) -> Optional[SparseEmbedding]
sparse
Union[SparseEmbedding, Dict[int, float], Dict[str, List], None]
required
Sparse embedding in any supported format:
  • SparseEmbedding object (passthrough)
  • Dict with int keys and float values (Milvus format)
  • Dict with indices and values lists (Pinecone format)
  • None (passthrough)
sparse
Optional[SparseEmbedding]
Normalized SparseEmbedding or None

to_milvus_sparse

Convert SparseEmbedding to Milvus format.
to_milvus_sparse(sparse: SparseEmbedding) -> Dict[int, float]
sparse
SparseEmbedding
required
Haystack SparseEmbedding object
milvus_sparse
Dict[int, float]
Dictionary mapping indices to values in Milvus format

to_pinecone_sparse

Convert SparseEmbedding to Pinecone sparse_values format.
to_pinecone_sparse(sparse: SparseEmbedding) -> Dict[str, List]
sparse
SparseEmbedding
required
Haystack SparseEmbedding object
pinecone_sparse
Dict[str, List]
Dictionary with ‘indices’ and ‘values’ keys in Pinecone format

to_qdrant_sparse

Convert SparseEmbedding to Qdrant SparseVector format.
to_qdrant_sparse(sparse: SparseEmbedding) -> SparseVector
sparse
SparseEmbedding
required
Haystack SparseEmbedding object
qdrant_sparse
SparseVector
Qdrant SparseVector object

get_doc_sparse_embedding

Extract sparse embedding from Document, checking standard and legacy locations.
get_doc_sparse_embedding(
    doc: Any,
    fallback_meta_key: str = "sparse_embedding"
) -> Optional[SparseEmbedding]
doc
Any
required
Haystack Document object
fallback_meta_key
str
default:"sparse_embedding"
Legacy meta key to check if doc.sparse_embedding is None
sparse
Optional[SparseEmbedding]
SparseEmbedding from doc.sparse_embedding or doc.meta[fallback_meta_key], or None

Document Converters

ChromaDocumentConverter

Utility class for converting between Haystack Documents and Chroma format.

prepare_haystack_documents_for_upsert

ChromaDocumentConverter.prepare_haystack_documents_for_upsert(
    documents: List[Document]
) -> Dict[str, Any]
documents
List[Document]
required
List of Haystack Document objects
data
Dict[str, Any]
Dictionary with keys: ids, texts, metadatas, embeddings formatted for Chroma upsert

PineconeDocumentConverter

Utility class for converting between Haystack Documents and Pinecone format.

prepare_haystack_documents_for_upsert

PineconeDocumentConverter.prepare_haystack_documents_for_upsert(
    documents: List[Document]
) -> List[Dict[str, Any]]
documents
List[Document]
required
List of Haystack Document objects
vectors
List[Dict[str, Any]]
List of Pinecone vector dictionaries with id, values, metadata keys

convert_query_results_to_haystack_documents

PineconeDocumentConverter.convert_query_results_to_haystack_documents(
    results: Dict[str, Any],
    include_embeddings: bool = False
) -> List[Document]
results
Dict[str, Any]
required
Pinecone query results dictionary
include_embeddings
bool
default:"False"
Whether to include vector embeddings in Documents
documents
List[Document]
List of Haystack Document objects

Configuration

load_config

Load configuration from YAML file.
load_config(config_path: str) -> Dict[str, Any]
config_path
str
required
Path to YAML configuration file
config
Dict[str, Any]
Loaded configuration dictionary with environment variables resolved

resolve_env_vars

Resolve environment variable references in configuration.
resolve_env_vars(config: Dict[str, Any]) -> Dict[str, Any]
config
Dict[str, Any]
required
Configuration dictionary potentially containing $ references
resolved
Dict[str, Any]
Configuration with all environment variables resolved

Build docs developers (and LLMs) love