Overview
The GenericRAGAdapter enables GEPA to optimize RAG (Retrieval-Augmented Generation) systems with any vector store implementation. It evaluates both retrieval and generation quality, optimizing:
- Query reformulation prompts
- Context synthesis prompts
- Answer generation prompts
- Document reranking criteria
Supports ChromaDB, Weaviate, Qdrant, Pinecone, Milvus, LanceDB and custom vector stores.
Installation
pip install gepa
# Install your vector store
pip install chromadb # or weaviate-client, qdrant-client, etc.
Quick Start
import gepa
from gepa.adapters.generic_rag_adapter import GenericRAGAdapter
from gepa.adapters.generic_rag_adapter.vector_stores.chroma_store import ChromaVectorStore
# Create vector store
vector_store = ChromaVectorStore.create_local(
persist_directory='./my_kb',
collection_name='documents'
)
# Add documents
vector_store.add_documents([
{'id': 'doc1', 'content': 'Machine learning is...', 'metadata': {}},
# ... more documents
])
# Create adapter
adapter = GenericRAGAdapter(
vector_store=vector_store,
llm_model='openai/gpt-4o-mini',
rag_config={
'retrieval_strategy': 'similarity',
'top_k': 5
}
)
# Prepare dataset
train_data = [
{
'query': 'What is machine learning?',
'ground_truth_answer': 'Machine learning is a subset of AI...',
'relevant_doc_ids': ['doc1'],
'metadata': {}
},
# ... more examples
]
# Optimize
result = gepa.optimize(
seed_candidate={
'answer_generation': 'Answer based on the context provided.'
},
trainset=train_data[:20],
valset=train_data[20:],
adapter=adapter,
max_metric_calls=50,
reflection_lm='openai/gpt-4'
)
print('Optimized prompts:', result.best_candidate)
Class Signature
Defined in src/gepa/adapters/generic_rag_adapter/generic_rag_adapter.py:106:
class GenericRAGAdapter(GEPAAdapter[RAGDataInst, RAGTrajectory, RAGOutput]):
def __init__(
self,
vector_store: VectorStoreInterface,
llm_model,
embedding_model: str = 'text-embedding-3-small',
embedding_function=None,
rag_config: dict[str, Any] | None = None,
failure_score: float = 0.0,
)
Parameters
vector_store
VectorStoreInterface
required
Vector store implementation. Must implement VectorStoreInterface:
- ChromaDB:
ChromaVectorStore
- Weaviate:
WeaviateVectorStore
- Qdrant:
QdrantVectorStore
- Milvus:
MilvusVectorStore
- LanceDB:
LanceDBVectorStore
- Custom: Implement
VectorStoreInterface
LLM for text generation. Can be:
- LiteLLM model string (e.g.,
'openai/gpt-4o-mini')
- Custom callable taking messages and returning response
embedding_model
str
default:"'text-embedding-3-small'"
Model name for text embeddings. Used when embedding_function not provided.
embedding_function
Callable | None
default:"None"
Custom embedding function (text: str) -> list[float]. If None, uses LiteLLM embeddings.
rag_config
dict[str, Any] | None
default:"None"
RAG pipeline configuration:
retrieval_strategy: 'similarity', 'hybrid', or 'vector' (default: 'similarity')
top_k: Number of documents to retrieve (default: 5)
retrieval_weight: Weight for retrieval in combined score (default: 0.3)
generation_weight: Weight for generation in combined score (default: 0.7)
hybrid_alpha: Semantic vs keyword balance for hybrid search (default: 0.5)
filters: Default metadata filters for retrieval
Score assigned when evaluation fails.
Data Types
RAGDataInst
Input data structure (src/gepa/adapters/generic_rag_adapter/generic_rag_adapter.py:12):
class RAGDataInst(TypedDict):
query: str # User query
ground_truth_answer: str # Expected answer
relevant_doc_ids: list[str] # Document IDs that should be retrieved
metadata: dict[str, Any] # Additional context
RAGTrajectory
Execution trace (src/gepa/adapters/generic_rag_adapter/generic_rag_adapter.py:44):
class RAGTrajectory(TypedDict):
original_query: str # Original user query
reformulated_query: str # Query after reformulation
retrieved_docs: list[dict] # Retrieved documents with scores
synthesized_context: str # Context after synthesis
generated_answer: str # Final answer
execution_metadata: dict # Metrics and performance data
RAGOutput
Final output (src/gepa/adapters/generic_rag_adapter/generic_rag_adapter.py:74):
class RAGOutput(TypedDict):
final_answer: str # Generated answer
confidence_score: float # Confidence (0.0 to 1.0)
retrieved_docs: list[dict] # Retrieved documents
total_tokens: int # Token usage
Optimizable Components
Improves query understanding and reformulation:
seed_candidate = {
'query_reformulation': 'Rephrase the query to improve retrieval.'
}
# GEPA might evolve this to:
# 'Extract key entities and concepts from the query. Expand abbreviations.
# Add relevant synonyms. Focus on information-seeking intent.'
2. Context Synthesis
Optimizes document combination and summarization:
seed_candidate = {
'context_synthesis': 'Combine the retrieved documents into a coherent context.'
}
# GEPA might evolve this to:
# 'Synthesize information from retrieved documents by:
# 1. Identifying common themes
# 2. Removing redundant information
# 3. Organizing by relevance to query
# 4. Preserving key facts and relationships'
3. Answer Generation
Enhances final answer quality:
seed_candidate = {
'answer_generation': 'Answer the question based on the context.'
}
# GEPA might evolve this to:
# 'Generate a comprehensive answer that:
# 1. Directly addresses the question
# 2. Cites specific information from context
# 3. Acknowledges any uncertainties
# 4. Uses clear, concise language'
4. Reranking Criteria
Improves document relevance ordering:
seed_candidate = {
'reranking_criteria': 'Rank documents by relevance to query.'
}
# GEPA might evolve this to:
# 'Rerank documents by:
# 1. Semantic similarity to query intent
# 2. Presence of query entities
# 3. Information density
# 4. Recency (if metadata available)'
Vector Store Setup
ChromaDB (Local)
from gepa.adapters.generic_rag_adapter.vector_stores.chroma_store import ChromaVectorStore
vector_store = ChromaVectorStore.create_local(
persist_directory='./my_knowledge_base',
collection_name='documents'
)
# Add documents
vector_store.add_documents([
{
'id': 'doc1',
'content': 'Your document text here...',
'metadata': {'source': 'manual', 'date': '2024-01-01'}
},
# ... more documents
])
Weaviate (Cloud)
from gepa.adapters.generic_rag_adapter.vector_stores.weaviate_store import WeaviateVectorStore
vector_store = WeaviateVectorStore.create_cloud(
cluster_url='https://your-cluster.weaviate.network',
api_key='your-api-key',
collection_name='Documents'
)
vector_store.add_documents([...])
Qdrant (Self-hosted)
from gepa.adapters.generic_rag_adapter.vector_stores.qdrant_store import QdrantVectorStore
vector_store = QdrantVectorStore(
host='localhost',
port=6333,
collection_name='my_documents'
)
vector_store.add_documents([...])
Custom Vector Store
Implement VectorStoreInterface:
from gepa.adapters.generic_rag_adapter.vector_store_interface import VectorStoreInterface
class MyVectorStore(VectorStoreInterface):
def search(
self,
query_embedding: list[float],
top_k: int = 5,
filters: dict | None = None
) -> list[dict[str, Any]]:
# Your search implementation
return results
def add_documents(self, documents: list[dict[str, Any]]) -> None:
# Your add implementation
pass
def get_document(self, doc_id: str) -> dict[str, Any] | None:
# Your get implementation
return document
Methods
evaluate()
Evaluates RAG system on a batch of queries.
def evaluate(
self,
batch: list[RAGDataInst],
candidate: dict[str, str],
capture_traces: bool = False,
) -> EvaluationBatch[RAGTrajectory, RAGOutput]
Implementation: src/gepa/adapters/generic_rag_adapter/generic_rag_adapter.py:205
Behavior
- Executes RAG pipeline for each query with candidate prompts
- Evaluates retrieval quality (precision, recall, F1, MRR)
- Evaluates generation quality (token F1, BLEU, faithfulness)
- Computes combined score (weighted)
- Returns
EvaluationBatch with outputs, scores, and optional trajectories
make_reflective_dataset()
Generates reflective dataset for prompt improvement.
def make_reflective_dataset(
self,
candidate: dict[str, str],
eval_batch: EvaluationBatch[RAGTrajectory, RAGOutput],
components_to_update: list[str],
) -> dict[str, list[dict[str, Any]]]
Implementation: src/gepa/adapters/generic_rag_adapter/generic_rag_adapter.py:337
Returns
Component-specific reflective examples:
{
'answer_generation': [
{
'Inputs': {
'query': 'What is ML?',
'context': 'Machine learning is...'
},
'Generated Outputs': 'ML is a type of AI...',
'Feedback': 'Good answer generation. Score: 0.85'
},
# ... more examples
]
}
Evaluation Metrics
Retrieval Metrics
- Precision: % of retrieved docs that are relevant
- Recall: % of relevant docs that were retrieved
- F1 Score: Harmonic mean of precision and recall
- MRR (Mean Reciprocal Rank): 1 / (rank of first relevant doc)
Generation Metrics
- Token F1: Overlap between generated and ground truth tokens
- BLEU Score: N-gram overlap
- Faithfulness: Answer is supported by retrieved context
- Answer Relevance: Answer addresses the query
Combined Score
combined_score = (
retrieval_weight * retrieval_f1 +
generation_weight * generation_score
)
Default: 30% retrieval, 70% generation.
Advanced Configuration
Hybrid Search
Combine semantic and keyword search:
adapter = GenericRAGAdapter(
vector_store=vector_store,
llm_model='openai/gpt-4o-mini',
rag_config={
'retrieval_strategy': 'hybrid',
'hybrid_alpha': 0.7, # 70% semantic, 30% keyword
'top_k': 10
}
)
Custom Weights
Adjust retrieval vs generation importance:
adapter = GenericRAGAdapter(
vector_store=vector_store,
llm_model='openai/gpt-4o-mini',
rag_config={
'retrieval_weight': 0.5, # 50% retrieval
'generation_weight': 0.5, # 50% generation
'top_k': 5
}
)
Filter documents by metadata:
adapter = GenericRAGAdapter(
vector_store=vector_store,
llm_model='openai/gpt-4o-mini',
rag_config={
'filters': {
'source': 'documentation',
'language': 'en'
}
}
)
Custom Embeddings
Use your own embedding function:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
def custom_embed(text: str) -> list[float]:
return model.encode(text).tolist()
adapter = GenericRAGAdapter(
vector_store=vector_store,
llm_model='openai/gpt-4o-mini',
embedding_function=custom_embed
)
Complete Example
import gepa
from gepa.adapters.generic_rag_adapter import GenericRAGAdapter
from gepa.adapters.generic_rag_adapter.vector_stores.chroma_store import ChromaVectorStore
# 1. Setup vector store
vector_store = ChromaVectorStore.create_local(
persist_directory='./kb',
collection_name='tech_docs'
)
# 2. Add documents
documents = [
{
'id': 'doc1',
'content': 'Machine learning is a method of data analysis...',
'metadata': {'category': 'AI', 'date': '2024-01-01'}
},
# ... 100+ documents
]
vector_store.add_documents(documents)
# 3. Create adapter
adapter = GenericRAGAdapter(
vector_store=vector_store,
llm_model='openai/gpt-4o-mini',
rag_config={
'retrieval_strategy': 'hybrid',
'top_k': 5,
'retrieval_weight': 0.3,
'generation_weight': 0.7
}
)
# 4. Prepare dataset
train_data = [
{
'query': 'What is machine learning?',
'ground_truth_answer': 'Machine learning is a method of data analysis that automates analytical model building.',
'relevant_doc_ids': ['doc1'],
'metadata': {}
},
# ... 50+ examples
]
# 5. Optimize all components
result = gepa.optimize(
seed_candidate={
'query_reformulation': 'Rephrase to improve retrieval.',
'context_synthesis': 'Combine documents into context.',
'answer_generation': 'Answer based on context.',
'reranking_criteria': 'Rank by relevance.'
},
trainset=train_data[:30],
valset=train_data[30:],
adapter=adapter,
max_metric_calls=100,
reflection_lm='openai/gpt-4'
)
# 6. Deploy optimized RAG system
optimized_prompts = result.best_candidate
print('Query Reformulation:', optimized_prompts['query_reformulation'])
print('Answer Generation:', optimized_prompts['answer_generation'])
print('Validation Score:', result.best_score)
Best Practices
- Document Quality: Ensure documents are well-formatted and contain relevant information
- Dataset Size: Use 30+ train examples and 20+ validation examples
- Relevant Doc IDs: Provide accurate
relevant_doc_ids for retrieval evaluation
- Component Selection: Start with
answer_generation, then add others
- Metadata: Use metadata filters when documents have clear categories
- Vector Store Choice: Use cloud providers for production, local for development
- Batch Size: Process 5-10 examples per batch for memory efficiency
- Top-K: Start with 5, increase if answers lack context
- Hybrid Alpha: 0.7 works well for most cases (70% semantic, 30% keyword)
- Caching: Vector stores cache embeddings automatically
Limitations
- Requires pre-populated vector store
- Document embeddings not optimized (only prompts)
- Single-turn queries only (no conversation context)
- Metadata filters are static (not optimized)
See Also