Skip to main content
VectorDB uses YAML configuration files to define pipeline behavior. This reference documents all available configuration sections and options.

Configuration file structure

A complete configuration file includes these sections:
# Data loading
dataloader:
  type: "triviaqa"
  split: "test"
  limit: 100

# Embedding models
embeddings:
  model: "sentence-transformers/all-MiniLM-L6-v2"
  device: "cpu"
  batch_size: 32

# Vector database connection
pinecone:
  api_key: "${PINECONE_API_KEY}"
  index_name: "my-index"
  dimension: 384
  metric: "cosine"

# Search settings
search:
  top_k: 10

# RAG generation
rag:
  enabled: true
  model: "llama-3.3-70b-versatile"
  api_key: "${GROQ_API_KEY}"
  temperature: 0.7
  max_tokens: 2048

# Logging
logging:
  name: "my_pipeline"
  level: "INFO"

Environment variable substitution

VectorDB supports two environment variable syntaxes:
api_key: "${PINECONE_API_KEY}"
# Returns empty string if PINECONE_API_KEY is not set
See environment variables for the complete list of supported variables.

Dataloader configuration

Controls dataset loading and preprocessing.
dataloader:
  type: "triviaqa"           # Dataset type
  dataset_name: "trivia_qa"  # HuggingFace dataset name (optional)
  config: "rc"               # Dataset config variant (optional)
  split: "test"              # Dataset split: train, test, validation
  limit: 100                 # Max documents to load (null for all)
  use_text_splitter: false   # Enable chunking for long documents

Supported datasets

Open-domain question-answering dataset.
dataloader:
  type: "triviaqa"
  split: "test"
  limit: 100
AI2 Reasoning Challenge for science questions.
dataloader:
  type: "arc"
  split: "test"
  limit: 200
Popular entity factoid questions.
dataloader:
  type: "popqa"
  split: "test"
  limit: 100
Atomic facts for verification.
dataloader:
  type: "factscore"
  split: "test"
  limit: 100
Financial transcript Q&A.
dataloader:
  type: "earnings_calls"
  split: "test"
  limit: 50

Embeddings configuration

Defines the embedding models for vector generation.

Dense embeddings

embeddings:
  model: "sentence-transformers/all-MiniLM-L6-v2"  # HuggingFace model
  device: "cpu"                                     # cpu or cuda
  batch_size: 32                                    # Embedding batch size

Model aliases

VectorDB provides convenient aliases for common models:
embeddings:
  model: "qwen3"     # Alias for Qwen/Qwen3-Embedding-0.6B
  # model: "minilm" # Alias for sentence-transformers/all-MiniLM-L6-v2
  # model: "mpnet"  # Alias for sentence-transformers/all-mpnet-base-v2

Hybrid embeddings (dense + sparse)

embeddings:
  model: "Qwen/Qwen3-Embedding-0.6B"          # Dense model
  sparse_model: "prithivida/Splade_PP_en_v2"  # Sparse model
  device: "cpu"
  batch_size: 32

Vector database configuration

Each database has specific connection and indexing settings.
pinecone:
  api_key: "${PINECONE_API_KEY}"  # API key (required)
  index_name: "my-index"          # Index name (required)
  namespace: ""                   # Namespace for isolation (optional)
  dimension: 384                  # Vector dimension (required)
  metric: "cosine"                # Distance metric: cosine, euclidean, dotproduct
  recreate: false                 # Recreate index if exists (default: false)
Namespaces for multi-tenancy:
pinecone:
  api_key: "${PINECONE_API_KEY}"
  index_name: "production"
  namespace: "tenant-123"  # Tenant-specific namespace
weaviate:
  cluster_url: "${WEAVIATE_URL}"      # Cluster URL (required)
  api_key: "${WEAVIATE_API_KEY}"      # API key (required)
  collection_name: "Documents"         # Collection name (required)
  timeout: 30                          # Request timeout in seconds
  connection_pool_size: 10             # Connection pool size
Multi-tenancy:
weaviate:
  cluster_url: "${WEAVIATE_URL}"
  api_key: "${WEAVIATE_API_KEY}"
  collection_name: "Documents"
  tenant: "customer-123"  # Native tenant isolation
milvus:
  uri: "${MILVUS_URI:-http://localhost:19530}"  # Connection URI
  token: "${MILVUS_TOKEN:-}"                    # Auth token (optional)
  collection_name: "documents"                  # Collection name (required)
  dimension: 384                                # Vector dimension (required)
  recreate: false                               # Recreate collection if exists
  batch_size: 100                               # Insert batch size
Partition-based multi-tenancy:
milvus:
  uri: "${MILVUS_URI}"
  collection_name: "documents"
  partition_key: "tenant_id"    # Field for partitioning
  num_partitions: 1000          # Max partitions
qdrant:
  url: "${QDRANT_URL:-http://localhost:6333}"  # Server URL
  api_key: "${QDRANT_API_KEY:-}"               # API key (optional)
  collection_name: "documents"                 # Collection name (required)
  timeout: 30                                  # Request timeout
  prefer_grpc: true                            # Use gRPC (faster)
With quantization:
qdrant:
  url: "${QDRANT_URL}"
  collection_name: "documents"
  quantization:
    enabled: true
    method: "scalar"        # scalar or binary
    compression_ratio: 4.0  # Target compression
chroma:
  path: "./chroma_data"                      # Local persistence path
  # OR for client/server mode:
  host: "${CHROMA_HOST:-localhost}"
  port: ${CHROMA_PORT:-8000}
  tenant: "default"                          # Tenant name
  database: "default"                        # Database name

Search configuration

Controls retrieval behavior.
search:
  top_k: 10                      # Number of results to return
  candidate_pool_size: 50        # Initial retrieval pool (before reranking)
  rrf_k: 60                      # RRF parameter for hybrid search
  retrieval_mode: "with_parents" # Parent doc retrieval mode
  max_parent_docs: 3             # Max unique parent documents

Metadata filtering

search:
  top_k: 10
  filters:
    must:                        # All conditions must match
      - key: "category"
        match:
          value: "science"
      - key: "year"
        range:
          gte: 2020

RAG configuration

Controls answer generation with LLMs.
rag:
  enabled: true                          # Enable answer generation
  model: "llama-3.3-70b-versatile"       # LLM model name
  api_key: "${GROQ_API_KEY}"             # API key
  api_base_url: "https://api.groq.com/openai/v1"  # API endpoint
  temperature: 0.7                       # Sampling temperature (0.0-1.0)
  max_tokens: 2048                       # Max tokens in response
  provider: "groq"                       # Provider: groq or openai

Groq configuration

rag:
  enabled: true
  provider: "groq"
  model: "llama-3.3-70b-versatile"
  api_key: "${GROQ_API_KEY}"
  temperature: 0.7
  max_tokens: 2048

OpenAI configuration

rag:
  enabled: true
  provider: "openai"
  model: "gpt-4-turbo-preview"
  api_key: "${OPENAI_API_KEY}"
  temperature: 0.7
  max_tokens: 2048

Reranking configuration

Improves precision with cross-encoder models.
reranker:
  type: "cross_encoder"                  # Reranker type
  model: "BAAI/bge-reranker-v2-m3"       # Model name
  top_k: 5                               # Final result count after reranking

Cohere reranking

reranker:
  type: "cohere"
  cohere_api_key: "${COHERE_API_KEY}"
  model: "rerank-english-v3.0"
  top_k: 5

Evaluation metrics

reranker:
  type: "cross_encoder"
  model: "BAAI/bge-reranker-v2-m3"
  top_k: 5

evaluation:
  enabled: true
  metrics:
    - contextual_recall
    - contextual_precision
    - answer_relevancy
    - faithfulness

Advanced features

Query enhancement

Generate multiple query variations for better recall.
query_enhancement:
  enabled: true
  method: "multi_query"  # multi_query, hyde, or step_back
  num_queries: 3         # Number of variations to generate
  llm_model: "llama-3.3-70b-versatile"
  api_key: "${GROQ_API_KEY}"
Methods:
  • multi_query: Generate N paraphrases of the query
  • hyde: Generate hypothetical answer, then search for similar documents
  • step_back: Generate broader conceptual query

Parent document retrieval

Index small chunks, return large context.
indexing:
  parent_chunk_size: 512     # Size of parent chunks (tokens)
  child_chunk_size: 128      # Size of child chunks (tokens)
  chunk_overlap: 20          # Overlap between chunks

search:
  top_k: 5                   # Child chunks to retrieve
  retrieval_mode: "with_parents"  # Return mode
  max_parent_docs: 3         # Max unique parents
Retrieval modes:
  • children_only: Return only child chunks
  • with_parents: Return full parent documents
  • context_window: Return parent with surrounding context

Contextual compression

Reduce retrieved context to save LLM tokens.
compression:
  enabled: true
  strategy: "extractive"     # extractive or llm_extraction
  num_sentences: 5           # Max sentences per document
  reranker_model: "cross-encoder/ms-marco-MiniLM-L-6-v2"

Agentic RAG

Iterative retrieval with self-reflection.
agentic:
  max_iterations: 3                      # Max refinement iterations
  quality_threshold: 75                  # Quality score threshold (0-100)
  router_model: "llama-3.3-70b-versatile"
  compression_mode: "reranking"          # reranking or llm

Cost optimization

Balance quality and cost.
cost_optimization:
  context_budget: 2000                   # Max tokens for LLM context
  model_tiering:
    routing: "llama-3.1-8b-instant"      # Cheaper model for routing
    generation: "llama-3.3-70b-versatile" # Capable model for answers
  compression:
    enabled: true
    strategy: "extractive"
    num_sentences: 5

Chunking configuration

chunking:
  chunk_size: 1000           # Max chunk size (characters)
  chunk_overlap: 200         # Overlap between chunks
  separators:                # Split on these separators (in order)
    - "\n\n"
    - "\n"
    - " "
    - ""

Logging configuration

Control logging output.
logging:
  name: "vectordb_pipeline"   # Logger name
  level: "INFO"               # Log level: DEBUG, INFO, WARNING, ERROR
  format: "text"              # Format: text or json

Log levels by environment

logging:
  name: "vectordb_production"
  level: "${LOG_LEVEL:-WARNING}"  # Use env var with fallback

Collection configuration

Defines collection metadata (used by some features).
collection:
  name: "documents"           # Collection/index name
  description: "Product documentation corpus"

Complete configuration examples

dataloader:
  type: "triviaqa"
  split: "test"
  limit: 100
  use_text_splitter: false

embeddings:
  model: "sentence-transformers/all-MiniLM-L6-v2"
  device: "cpu"
  batch_size: 32

pinecone:
  api_key: "${PINECONE_API_KEY}"
  index_name: "lc-semantic-search-triviaqa"
  namespace: ""
  dimension: 384
  metric: "cosine"
  recreate: false

search:
  top_k: 10

rag:
  enabled: false
  model: "llama-3.3-70b-versatile"
  api_key: "${GROQ_API_KEY}"
  temperature: 0.7
  max_tokens: 2048

logging:
  name: "lc_semantic_search_pinecone"
  level: "INFO"
dataloader:
  type: "triviaqa"
  dataset_name: "trivia_qa"
  config: "rc"
  split: "test"
  limit: null

embeddings:
  model: "Qwen/Qwen3-Embedding-0.6B"
  sparse_model: "prithivida/Splade_PP_en_v2"
  device: "cpu"
  batch_size: 32

milvus:
  uri: "${MILVUS_URI:-http://localhost:19530}"
  token: "${MILVUS_TOKEN:-}"
  collection_name: "triviaqa_hybrid"
  dimension: 384
  recreate: false
  batch_size: 100

logging:
  name: "milvus_hybrid"
  level: "INFO"
pinecone:
  api_key: "${PINECONE_API_KEY:-}"

collection:
  name: "triviaqa_reranking"

dataloader:
  type: "triviaqa"
  dataset_name: "trivia_qa"
  config: "rc"
  split: "test"
  limit: null

generator:
  model: "llama-3.3-70b-versatile"
  api_key: "${GROQ_API_KEY:-}"

embeddings:
  model: "Qwen/Qwen3-Embedding-0.6B"
  batch_size: 32

reranker:
  type: "cross_encoder"
  model: "BAAI/bge-reranker-v2-m3"
  top_k: 5

evaluation:
  enabled: true
  metrics:
    - contextual_recall
    - contextual_precision
    - answer_relevancy
    - faithfulness

logging:
  name: "pinecone_reranking"
  level: "INFO"
dataloader:
  type: "triviaqa"
  split: "test"
  limit: 100
  use_text_splitter: false

embeddings:
  model: "sentence-transformers/all-MiniLM-L6-v2"
  device: "cpu"
  batch_size: 32

pinecone:
  api_key: "${PINECONE_API_KEY}"
  index_name: "lc-agentic-rag-triviaqa"
  namespace: ""
  dimension: 384
  metric: "cosine"
  recreate: false

search:
  top_k: 10

rag:
  enabled: true
  model: "llama-3.3-70b-versatile"
  api_key: "${GROQ_API_KEY}"
  temperature: 0.7
  max_tokens: 2048

reranker:
  model: "cross-encoder/ms-marco-MiniLM-L-6-v2"

agentic:
  router_model: "llama-3.3-70b-versatile"
  max_iterations: 3
  compression_mode: "reranking"

logging:
  name: "lc_agentic_rag_pinecone_triviaqa"
  level: "INFO"
dataloader:
  type: "triviaqa"
  split: "test"
  limit: 100
  use_text_splitter: false

embeddings:
  model: "sentence-transformers/all-MiniLM-L6-v2"
  device: "cpu"
  batch_size: 32

chunking:
  chunk_size: 1000
  chunk_overlap: 200
  separators:
    - "\n\n"
    - "\n"
    - " "
    - ""

pinecone:
  api_key: "${PINECONE_API_KEY}"
  index_name: "lc-cost-optimized-rag-triviaqa"
  namespace: ""
  dimension: 384
  metric: "cosine"
  recreate: false

search:
  top_k: 10
  rrf_k: 60

rag:
  enabled: false
  model: "llama-3.3-70b-versatile"
  api_key: "${GROQ_API_KEY}"
  temperature: 0.7
  max_tokens: 2048

logging:
  name: "lc_cost_optimized_rag_pinecone"
  level: "INFO"

Loading configurations in code

VectorDB provides multiple ways to load configurations:

From YAML file

from vectordb.utils.config_loader import ConfigLoader

config = ConfigLoader.load("configs/production.yaml")
ConfigLoader.validate(config, "pinecone")

From dictionary

config = {
    "dataloader": {"type": "triviaqa", "split": "test"},
    "embeddings": {"model": "minilm", "batch_size": 32},
    "pinecone": {
        "api_key": os.getenv("PINECONE_API_KEY"),
        "index_name": "my-index",
        "dimension": 384
    }
}

resolved_config = ConfigLoader.load(config)

With pipeline classes

from vectordb.langchain.semantic_search import PineconeSemanticSearchPipeline

pipeline = PineconeSemanticSearchPipeline("configs/production.yaml")

Next steps

Environment variables

Reference for all environment variables

Building RAG pipelines

Step-by-step tutorial using these configurations

Benchmarking

Evaluate different configurations

Production deployment

Deploy your configured pipelines

Build docs developers (and LLMs) love