Configuration file structure
A complete configuration file includes these sections:Environment variable substitution
VectorDB supports two environment variable syntaxes:Dataloader configuration
Controls dataset loading and preprocessing.Supported datasets
TriviaQA
TriviaQA
Open-domain question-answering dataset.
ARC
ARC
AI2 Reasoning Challenge for science questions.
PopQA
PopQA
Popular entity factoid questions.
FactScore
FactScore
Atomic facts for verification.
Earnings Calls
Earnings Calls
Financial transcript Q&A.
Embeddings configuration
Defines the embedding models for vector generation.Dense embeddings
Model aliases
VectorDB provides convenient aliases for common models:Hybrid embeddings (dense + sparse)
Vector database configuration
Each database has specific connection and indexing settings.Pinecone
Pinecone
Weaviate
Weaviate
Milvus
Milvus
Qdrant
Qdrant
Chroma
Chroma
Search configuration
Controls retrieval behavior.Metadata filtering
RAG configuration
Controls answer generation with LLMs.Groq configuration
OpenAI configuration
Reranking configuration
Improves precision with cross-encoder models.Cohere reranking
Evaluation metrics
Advanced features
Query enhancement
Generate multiple query variations for better recall.multi_query: Generate N paraphrases of the queryhyde: Generate hypothetical answer, then search for similar documentsstep_back: Generate broader conceptual query
Parent document retrieval
Index small chunks, return large context.children_only: Return only child chunkswith_parents: Return full parent documentscontext_window: Return parent with surrounding context
Contextual compression
Reduce retrieved context to save LLM tokens.Agentic RAG
Iterative retrieval with self-reflection.Cost optimization
Balance quality and cost.Chunking configuration
Logging configuration
Control logging output.Log levels by environment
Collection configuration
Defines collection metadata (used by some features).Complete configuration examples
Semantic search (Pinecone)
Semantic search (Pinecone)
Hybrid search (Milvus)
Hybrid search (Milvus)
Reranking pipeline (Haystack)
Reranking pipeline (Haystack)
Agentic RAG (LangChain)
Agentic RAG (LangChain)
Cost-optimized RAG
Cost-optimized RAG
Loading configurations in code
VectorDB provides multiple ways to load configurations:From YAML file
From dictionary
With pipeline classes
Next steps
Environment variables
Reference for all environment variables
Building RAG pipelines
Step-by-step tutorial using these configurations
Benchmarking
Evaluate different configurations
Production deployment
Deploy your configured pipelines