What you’ll build
By the end of this tutorial, you’ll have a production-ready RAG pipeline that:- Loads and indexes a dataset into a vector database
- Performs semantic search with configurable retrieval
- Reranks results for precision
- Generates answers using an LLM
- Evaluates retrieval quality with standard metrics
Prerequisites
Before starting, ensure you have:- Python 3.10 or later installed
- API keys for your chosen services (Pinecone, Groq, etc.)
- Basic familiarity with YAML configuration files
Install dependencies
VectorDB uses uv for dependency management:
Set up environment variables
Create a These credentials are used in configuration files via the
.env file or export environment variables for your API keys:${VAR} syntax for secure credential management. See the environment variables reference for all supported variables.Create your configuration file
Create a YAML configuration file that defines your pipeline. Here’s a complete example for semantic search with RAG:See the configuration reference for detailed documentation of all options.
config/my_rag_pipeline.yaml
Index your documents
Load the dataset and index documents into your vector database:The indexing process:
- Loads documents from the specified dataset
- Generates embeddings using the configured model
- Stores vectors and metadata in Pinecone
Add reranking for precision
Improve result quality by adding cross-encoder reranking. Update your configuration:Reranking applies a more expensive cross-encoder model to the top candidates retrieved by the initial vector search, improving precision at the cost of increased latency.
Generate answers with RAG
With RAG enabled in your configuration, the pipeline automatically generates answers:
Evaluate retrieval quality
Measure your pipeline’s performance using built-in evaluation metrics:See the benchmarking guide for comprehensive evaluation strategies.
Hybrid search for better recall
Hybrid search combines dense (semantic) and sparse (keyword) retrieval for improved recall:- Queries contain specific terminology or product names
- You need both semantic understanding and exact keyword matching
- Recall is more important than latency
Advanced features
Query enhancement
Generate multiple query variations to improve retrieval:Parent document retrieval
Index small chunks but return larger parent context:Contextual compression
Reduce token costs by compressing retrieved context:Database-specific examples
Pinecone
Pinecone
Weaviate
Weaviate
Milvus
Milvus
Qdrant
Qdrant
Chroma
Chroma
Next steps
Configuration reference
Complete guide to all configuration options
Benchmarking
Evaluate and compare retrieval quality across databases
Production deployment
Best practices for deploying RAG pipelines to production
Environment variables
Reference for all supported environment variables