Skip to main content
This guide walks you through building a complete Retrieval-Augmented Generation (RAG) pipeline using VectorDB. You’ll learn how to index documents, configure search, add reranking, and generate answers with an LLM.

What you’ll build

By the end of this tutorial, you’ll have a production-ready RAG pipeline that:
  • Loads and indexes a dataset into a vector database
  • Performs semantic search with configurable retrieval
  • Reranks results for precision
  • Generates answers using an LLM
  • Evaluates retrieval quality with standard metrics

Prerequisites

Before starting, ensure you have:
  • Python 3.10 or later installed
  • API keys for your chosen services (Pinecone, Groq, etc.)
  • Basic familiarity with YAML configuration files
1

Install dependencies

VectorDB uses uv for dependency management:
# Install uv
pip install uv

# Install project dependencies
uv sync

# Activate the virtual environment
source .venv/bin/activate
2

Set up environment variables

Create a .env file or export environment variables for your API keys:
export PINECONE_API_KEY="your-pinecone-api-key"
export GROQ_API_KEY="your-groq-api-key"
These credentials are used in configuration files via the ${VAR} syntax for secure credential management. See the environment variables reference for all supported variables.
3

Create your configuration file

Create a YAML configuration file that defines your pipeline. Here’s a complete example for semantic search with RAG:
config/my_rag_pipeline.yaml
dataloader:
  type: "triviaqa"
  split: "test"
  limit: 100
  use_text_splitter: false

embeddings:
  model: "sentence-transformers/all-MiniLM-L6-v2"
  device: "cpu"
  batch_size: 32

pinecone:
  api_key: "${PINECONE_API_KEY}"
  index_name: "my-rag-pipeline"
  namespace: ""
  dimension: 384
  metric: "cosine"
  recreate: false

search:
  top_k: 10

rag:
  enabled: true
  model: "llama-3.3-70b-versatile"
  api_key: "${GROQ_API_KEY}"
  temperature: 0.7
  max_tokens: 2048

logging:
  name: "my_rag_pipeline"
  level: "INFO"
See the configuration reference for detailed documentation of all options.
4

Index your documents

Load the dataset and index documents into your vector database:
from vectordb.langchain.semantic_search import PineconeSemanticSearchPipeline

# Initialize pipeline with your config
pipeline = PineconeSemanticSearchPipeline(
    "config/my_rag_pipeline.yaml"
)

# Index documents
pipeline.index()
The indexing process:
  1. Loads documents from the specified dataset
  2. Generates embeddings using the configured model
  3. Stores vectors and metadata in Pinecone
5

Perform semantic search

Search your indexed documents using natural language queries:
# Search for relevant documents
result = pipeline.search(
    "What is photosynthesis?",
    top_k=5
)

# Access retrieved documents
for doc in result["documents"]:
    print(f"Score: {doc.score}")
    print(f"Content: {doc.content}")
    print("---")
6

Add reranking for precision

Improve result quality by adding cross-encoder reranking. Update your configuration:
reranker:
  type: "cross_encoder"
  model: "BAAI/bge-reranker-v2-m3"
  top_k: 5

evaluation:
  enabled: true
  metrics:
    - contextual_recall
    - contextual_precision
    - answer_relevancy
    - faithfulness
Reranking applies a more expensive cross-encoder model to the top candidates retrieved by the initial vector search, improving precision at the cost of increased latency.
7

Generate answers with RAG

With RAG enabled in your configuration, the pipeline automatically generates answers:
result = pipeline.search(
    "Explain how neural networks learn",
    top_k=10
)

# The answer is generated using retrieved context
print(result["answer"])

# Access the source documents
for doc in result["source_documents"]:
    print(f"Source: {doc.metadata['source']}")
8

Evaluate retrieval quality

Measure your pipeline’s performance using built-in evaluation metrics:
from vectordb.utils.evaluation import evaluate_retrieval, QueryResult

# Run evaluation on test queries
query_results = []
for query_data in test_queries:
    result = pipeline.search(query_data["query"], top_k=10)
    query_results.append(
        QueryResult(
            query=query_data["query"],
            retrieved_ids=[doc.id for doc in result["documents"]],
            relevant_ids=set(query_data["relevant_ids"])
        )
    )

# Compute metrics
metrics = evaluate_retrieval(query_results, k=10)
print(f"Recall@10: {metrics.recall_at_k:.3f}")
print(f"Precision@10: {metrics.precision_at_k:.3f}")
print(f"MRR: {metrics.mrr:.3f}")
print(f"NDCG@10: {metrics.ndcg_at_k:.3f}")
See the benchmarking guide for comprehensive evaluation strategies.

Hybrid search for better recall

Hybrid search combines dense (semantic) and sparse (keyword) retrieval for improved recall:
embeddings:
  model: "Qwen/Qwen3-Embedding-0.6B"
  sparse_model: "prithivida/Splade_PP_en_v2"
  device: "cpu"
  batch_size: 32

search:
  top_k: 10
  rrf_k: 60  # Reciprocal Rank Fusion parameter
from vectordb.haystack.hybrid_indexing import MilvusHybridSearchPipeline

pipeline = MilvusHybridSearchPipeline(
    "config/hybrid_config.yaml"
)
result = pipeline.run(query="machine learning algorithms", top_k=10)
Hybrid search excels when:
  • Queries contain specific terminology or product names
  • You need both semantic understanding and exact keyword matching
  • Recall is more important than latency

Advanced features

Query enhancement

Generate multiple query variations to improve retrieval:
query_enhancement:
  enabled: true
  method: "multi_query"  # or "hyde" or "step_back"
  num_queries: 3

Parent document retrieval

Index small chunks but return larger parent context:
indexing:
  parent_chunk_size: 512
  child_chunk_size: 128
  chunk_overlap: 20

search:
  retrieval_mode: "with_parents"
  max_parent_docs: 3

Contextual compression

Reduce token costs by compressing retrieved context:
compression:
  enabled: true
  strategy: "extractive"
  num_sentences: 5

Database-specific examples

from vectordb.langchain.semantic_search import PineconeSemanticSearchPipeline

pipeline = PineconeSemanticSearchPipeline(
    "configs/pinecone_triviaqa.yaml"
)
result = pipeline.search("Who invented the telephone?", top_k=5)
from vectordb.langchain.semantic_search import WeaviateSemanticSearchPipeline

pipeline = WeaviateSemanticSearchPipeline(
    "configs/weaviate_popqa.yaml"
)
result = pipeline.search("Who invented the telephone?", top_k=5)
from vectordb.haystack.hybrid_indexing import MilvusHybridSearchPipeline

pipeline = MilvusHybridSearchPipeline(
    "configs/milvus_triviaqa.yaml"
)
result = pipeline.run(query="machine learning algorithms", top_k=10)
from vectordb.langchain.semantic_search import QdrantSemanticSearchPipeline

pipeline = QdrantSemanticSearchPipeline(
    "configs/qdrant_triviaqa.yaml"
)
result = pipeline.search("What is quantum computing?", top_k=5)
from vectordb.langchain.semantic_search import ChromaSemanticSearchPipeline

pipeline = ChromaSemanticSearchPipeline(
    "configs/chroma_triviaqa.yaml"
)
result = pipeline.search("Explain photosynthesis", top_k=5)

Next steps

Configuration reference

Complete guide to all configuration options

Benchmarking

Evaluate and compare retrieval quality across databases

Production deployment

Best practices for deploying RAG pipelines to production

Environment variables

Reference for all supported environment variables

Build docs developers (and LLMs) love