Key Results
HotpotQA Multi-Hop
Optimized query generation for second-hop retrieval with detailed strategies
Healthcare RAG
Multi-agent system for diabetes and COPD with specialized retrievers
Vector Store Agnostic
Works with ChromaDB, Weaviate, Qdrant, Pinecone, and more
Weaviate Integration
Official tutorial for reranker optimization in RAG pipelines
What Can Be Optimized?
In a RAG pipeline, GEPA can optimize:- Query Reformulation: Transform user queries for better retrieval
- Retrieval Prompts: Instructions for what to retrieve and why
- Reranking Strategies: How to prioritize retrieved documents
- Context Synthesis: How to combine multiple documents
- Answer Generation: Prompts for generating final answers
- Multi-Hop Logic: Strategies for iterative retrieval
RAG Optimization with DSPy
The most powerful way to optimize RAG pipelines is with DSPy:What Gets Optimized?
DSPy+GEPA automatically optimizes:- Retrieval query formulation
- Number of passages to retrieve (adaptive k)
- Instructions for context usage in generation
- Chain-of-thought reasoning strategies
Generic RAG Adapter
For non-DSPy RAG systems, use the Generic RAG Adapter:Supported Vector Stores
The Generic RAG Adapter works with:- ChromaDB: Local and client-server
- Weaviate: Open-source vector database
- Qdrant: High-performance vector search
- Pinecone: Managed vector database
- Custom: Implement the vector store interface
Use Case 1: Multi-Hop Question Answering
Problem: HotpotQA requires retrieving information across multiple documents to answer complex questions.Example: Second-Hop Query Generation
Given:- Original question: “What is the population of the archipelago containing Arco da Calheta?”
- First-hop summary: “Arco da Calheta is a civil parish in Madeira with population 3,226 in 2011”
- Just use the original question → retrieves more documents about Arco da Calheta
- Generate second-hop query: “Madeira archipelago population in 2011”
- This retrieves documents about the wider region, not just the parish
Evolved Strategy (Excerpt)
Implementation
Use Case 2: Healthcare Multi-Agent RAG
Problem: General medical RAG systems struggle with specialized disease knowledge. Solution: GEPA discovers a multi-agent architecture with disease-specific experts.Architecture
Results
- Improved retrieval precision for disease-specific queries
- Better answer quality through specialized reasoning
- Graceful fallback for general medical questions
Use Case 3: Reranking Optimization
Reranking retrieved documents is crucial for RAG quality. GEPA optimizes reranking prompts.Weaviate Tutorial
Official tutorial from Weaviate on optimizing listwise rerankers:optimize_anything for RAG
For more control, useoptimize_anything directly:
Common RAG Failure Modes and GEPA Solutions
Poor Query Formulation
Poor Query Formulation
Problem: User queries don’t match document phrasing.GEPA Solution: Learns query reformulation strategies that bridge the vocabulary gap.Example: “How do I fix X?” → “X troubleshooting guide error solutions”
Irrelevant Retrieved Documents
Irrelevant Retrieved Documents
Problem: Top-k retrieval returns off-topic documents.GEPA Solution: Optimizes both the retrieval prompt and reranking logic to surface relevant content.
Context Overload
Context Overload
Problem: Too many documents overwhelm the LLM’s context window.GEPA Solution: Discovers strategies for context synthesis and compression.
Multi-Hop Failures
Multi-Hop Failures
Problem: Complex questions require multiple retrieval steps.GEPA Solution: Evolves iterative retrieval strategies that know when to do second or third hops.
Domain Mismatch
Domain Mismatch
Problem: General retrieval strategies fail on specialized domains.GEPA Solution: Learns domain-specific retrieval and reasoning patterns (see Healthcare RAG example).
Best Practices
Start with a working baseline
Start with a working baseline
Even a simple retrieve-then-generate pipeline is enough. GEPA will evolve sophistication.
Use informative metrics
Use informative metrics
F1 score, exact match, or ROUGE for QA tasks. For open-ended generation, use LLM-as-judge.
Optimize end-to-end, not components in isolation
Optimize end-to-end, not components in isolation
Retrieve + rerank + generate should be optimized together so they co-adapt.
Provide retrieval diagnostics in ASI
Provide retrieval diagnostics in ASI
Return the actual retrieved documents in side info so GEPA can see what was retrieved.
Use validation sets for generalization
Use validation sets for generalization
RAG systems must generalize to unseen questions. Always use a valset.
Integration Examples
LangChain RAG
LlamaIndex RAG
Production Deployments
OCR Document Understanding
Intrinsic Labs achieved up to 38% OCR error reduction using GEPA-optimized prompts for document extraction:- Gemini 2.5 Pro
- Gemini 2.5 Flash
- Gemini 2.0 Flash
Enterprise Document Search
FireBird Technologies optimized their Auto-Analyst platform with GEPA:- 4 specialized agents: Pre-processing, Statistical Analytics, ML, Visualization
- Optimized 4 primary signatures covering 90% of code runs
- Tested across multiple model providers to avoid overfitting
Metrics for RAG Evaluation
Answer Correctness
F1, Exact Match, ROUGE-L for QA tasks
Retrieval Quality
Precision@k, Recall@k, MRR for retrieved documents
Relevance
LLM-as-judge scoring answer relevance and faithfulness
Latency
End-to-end response time including retrieval and generation
Example Metric Implementation
Next Steps
DSPy + GEPA Tutorial
Complete RAG optimization walkthrough
Generic RAG Adapter
Use GEPA with any vector store
Prompt Optimization
Optimize individual prompts
Agent Architecture
Discover multi-agent architectures