Skip to main content

Overview

The GenericRAGAdapter enables GEPA to optimize RAG (Retrieval-Augmented Generation) systems with any vector store implementation. It evaluates both retrieval and generation quality, optimizing:
  • Query reformulation prompts
  • Context synthesis prompts
  • Answer generation prompts
  • Document reranking criteria
Supports ChromaDB, Weaviate, Qdrant, Pinecone, Milvus, LanceDB and custom vector stores.

Installation

pip install gepa

# Install your vector store
pip install chromadb  # or weaviate-client, qdrant-client, etc.

Quick Start

import gepa
from gepa.adapters.generic_rag_adapter import GenericRAGAdapter
from gepa.adapters.generic_rag_adapter.vector_stores.chroma_store import ChromaVectorStore

# Create vector store
vector_store = ChromaVectorStore.create_local(
    persist_directory='./my_kb',
    collection_name='documents'
)

# Add documents
vector_store.add_documents([
    {'id': 'doc1', 'content': 'Machine learning is...', 'metadata': {}},
    # ... more documents
])

# Create adapter
adapter = GenericRAGAdapter(
    vector_store=vector_store,
    llm_model='openai/gpt-4o-mini',
    rag_config={
        'retrieval_strategy': 'similarity',
        'top_k': 5
    }
)

# Prepare dataset
train_data = [
    {
        'query': 'What is machine learning?',
        'ground_truth_answer': 'Machine learning is a subset of AI...',
        'relevant_doc_ids': ['doc1'],
        'metadata': {}
    },
    # ... more examples
]

# Optimize
result = gepa.optimize(
    seed_candidate={
        'answer_generation': 'Answer based on the context provided.'
    },
    trainset=train_data[:20],
    valset=train_data[20:],
    adapter=adapter,
    max_metric_calls=50,
    reflection_lm='openai/gpt-4'
)

print('Optimized prompts:', result.best_candidate)

Class Signature

Defined in src/gepa/adapters/generic_rag_adapter/generic_rag_adapter.py:106:
class GenericRAGAdapter(GEPAAdapter[RAGDataInst, RAGTrajectory, RAGOutput]):
    def __init__(
        self,
        vector_store: VectorStoreInterface,
        llm_model,
        embedding_model: str = 'text-embedding-3-small',
        embedding_function=None,
        rag_config: dict[str, Any] | None = None,
        failure_score: float = 0.0,
    )

Parameters

vector_store
VectorStoreInterface
required
Vector store implementation. Must implement VectorStoreInterface:
  • ChromaDB: ChromaVectorStore
  • Weaviate: WeaviateVectorStore
  • Qdrant: QdrantVectorStore
  • Milvus: MilvusVectorStore
  • LanceDB: LanceDBVectorStore
  • Custom: Implement VectorStoreInterface
llm_model
str | Callable
required
LLM for text generation. Can be:
  • LiteLLM model string (e.g., 'openai/gpt-4o-mini')
  • Custom callable taking messages and returning response
embedding_model
str
default:"'text-embedding-3-small'"
Model name for text embeddings. Used when embedding_function not provided.
embedding_function
Callable | None
default:"None"
Custom embedding function (text: str) -> list[float]. If None, uses LiteLLM embeddings.
rag_config
dict[str, Any] | None
default:"None"
RAG pipeline configuration:
  • retrieval_strategy: 'similarity', 'hybrid', or 'vector' (default: 'similarity')
  • top_k: Number of documents to retrieve (default: 5)
  • retrieval_weight: Weight for retrieval in combined score (default: 0.3)
  • generation_weight: Weight for generation in combined score (default: 0.7)
  • hybrid_alpha: Semantic vs keyword balance for hybrid search (default: 0.5)
  • filters: Default metadata filters for retrieval
failure_score
float
default:"0.0"
Score assigned when evaluation fails.

Data Types

RAGDataInst

Input data structure (src/gepa/adapters/generic_rag_adapter/generic_rag_adapter.py:12):
class RAGDataInst(TypedDict):
    query: str                      # User query
    ground_truth_answer: str        # Expected answer
    relevant_doc_ids: list[str]     # Document IDs that should be retrieved
    metadata: dict[str, Any]        # Additional context

RAGTrajectory

Execution trace (src/gepa/adapters/generic_rag_adapter/generic_rag_adapter.py:44):
class RAGTrajectory(TypedDict):
    original_query: str             # Original user query
    reformulated_query: str         # Query after reformulation
    retrieved_docs: list[dict]      # Retrieved documents with scores
    synthesized_context: str        # Context after synthesis
    generated_answer: str           # Final answer
    execution_metadata: dict        # Metrics and performance data

RAGOutput

Final output (src/gepa/adapters/generic_rag_adapter/generic_rag_adapter.py:74):
class RAGOutput(TypedDict):
    final_answer: str               # Generated answer
    confidence_score: float         # Confidence (0.0 to 1.0)
    retrieved_docs: list[dict]      # Retrieved documents
    total_tokens: int               # Token usage

Optimizable Components

1. Query Reformulation

Improves query understanding and reformulation:
seed_candidate = {
    'query_reformulation': 'Rephrase the query to improve retrieval.'
}

# GEPA might evolve this to:
# 'Extract key entities and concepts from the query. Expand abbreviations.
#  Add relevant synonyms. Focus on information-seeking intent.'

2. Context Synthesis

Optimizes document combination and summarization:
seed_candidate = {
    'context_synthesis': 'Combine the retrieved documents into a coherent context.'
}

# GEPA might evolve this to:
# 'Synthesize information from retrieved documents by:
#  1. Identifying common themes
#  2. Removing redundant information
#  3. Organizing by relevance to query
#  4. Preserving key facts and relationships'

3. Answer Generation

Enhances final answer quality:
seed_candidate = {
    'answer_generation': 'Answer the question based on the context.'
}

# GEPA might evolve this to:
# 'Generate a comprehensive answer that:
#  1. Directly addresses the question
#  2. Cites specific information from context
#  3. Acknowledges any uncertainties
#  4. Uses clear, concise language'

4. Reranking Criteria

Improves document relevance ordering:
seed_candidate = {
    'reranking_criteria': 'Rank documents by relevance to query.'
}

# GEPA might evolve this to:
# 'Rerank documents by:
#  1. Semantic similarity to query intent
#  2. Presence of query entities
#  3. Information density
#  4. Recency (if metadata available)'

Vector Store Setup

ChromaDB (Local)

from gepa.adapters.generic_rag_adapter.vector_stores.chroma_store import ChromaVectorStore

vector_store = ChromaVectorStore.create_local(
    persist_directory='./my_knowledge_base',
    collection_name='documents'
)

# Add documents
vector_store.add_documents([
    {
        'id': 'doc1',
        'content': 'Your document text here...',
        'metadata': {'source': 'manual', 'date': '2024-01-01'}
    },
    # ... more documents
])

Weaviate (Cloud)

from gepa.adapters.generic_rag_adapter.vector_stores.weaviate_store import WeaviateVectorStore

vector_store = WeaviateVectorStore.create_cloud(
    cluster_url='https://your-cluster.weaviate.network',
    api_key='your-api-key',
    collection_name='Documents'
)

vector_store.add_documents([...])

Qdrant (Self-hosted)

from gepa.adapters.generic_rag_adapter.vector_stores.qdrant_store import QdrantVectorStore

vector_store = QdrantVectorStore(
    host='localhost',
    port=6333,
    collection_name='my_documents'
)

vector_store.add_documents([...])

Custom Vector Store

Implement VectorStoreInterface:
from gepa.adapters.generic_rag_adapter.vector_store_interface import VectorStoreInterface

class MyVectorStore(VectorStoreInterface):
    def search(
        self,
        query_embedding: list[float],
        top_k: int = 5,
        filters: dict | None = None
    ) -> list[dict[str, Any]]:
        # Your search implementation
        return results
    
    def add_documents(self, documents: list[dict[str, Any]]) -> None:
        # Your add implementation
        pass
    
    def get_document(self, doc_id: str) -> dict[str, Any] | None:
        # Your get implementation
        return document

Methods

evaluate()

Evaluates RAG system on a batch of queries.
def evaluate(
    self,
    batch: list[RAGDataInst],
    candidate: dict[str, str],
    capture_traces: bool = False,
) -> EvaluationBatch[RAGTrajectory, RAGOutput]
Implementation: src/gepa/adapters/generic_rag_adapter/generic_rag_adapter.py:205

Behavior

  1. Executes RAG pipeline for each query with candidate prompts
  2. Evaluates retrieval quality (precision, recall, F1, MRR)
  3. Evaluates generation quality (token F1, BLEU, faithfulness)
  4. Computes combined score (weighted)
  5. Returns EvaluationBatch with outputs, scores, and optional trajectories

make_reflective_dataset()

Generates reflective dataset for prompt improvement.
def make_reflective_dataset(
    self,
    candidate: dict[str, str],
    eval_batch: EvaluationBatch[RAGTrajectory, RAGOutput],
    components_to_update: list[str],
) -> dict[str, list[dict[str, Any]]]
Implementation: src/gepa/adapters/generic_rag_adapter/generic_rag_adapter.py:337

Returns

Component-specific reflective examples:
{
    'answer_generation': [
        {
            'Inputs': {
                'query': 'What is ML?',
                'context': 'Machine learning is...'
            },
            'Generated Outputs': 'ML is a type of AI...',
            'Feedback': 'Good answer generation. Score: 0.85'
        },
        # ... more examples
    ]
}

Evaluation Metrics

Retrieval Metrics

  • Precision: % of retrieved docs that are relevant
  • Recall: % of relevant docs that were retrieved
  • F1 Score: Harmonic mean of precision and recall
  • MRR (Mean Reciprocal Rank): 1 / (rank of first relevant doc)

Generation Metrics

  • Token F1: Overlap between generated and ground truth tokens
  • BLEU Score: N-gram overlap
  • Faithfulness: Answer is supported by retrieved context
  • Answer Relevance: Answer addresses the query

Combined Score

combined_score = (
    retrieval_weight * retrieval_f1 +
    generation_weight * generation_score
)
Default: 30% retrieval, 70% generation.

Advanced Configuration

Combine semantic and keyword search:
adapter = GenericRAGAdapter(
    vector_store=vector_store,
    llm_model='openai/gpt-4o-mini',
    rag_config={
        'retrieval_strategy': 'hybrid',
        'hybrid_alpha': 0.7,  # 70% semantic, 30% keyword
        'top_k': 10
    }
)

Custom Weights

Adjust retrieval vs generation importance:
adapter = GenericRAGAdapter(
    vector_store=vector_store,
    llm_model='openai/gpt-4o-mini',
    rag_config={
        'retrieval_weight': 0.5,  # 50% retrieval
        'generation_weight': 0.5,  # 50% generation
        'top_k': 5
    }
)

Metadata Filtering

Filter documents by metadata:
adapter = GenericRAGAdapter(
    vector_store=vector_store,
    llm_model='openai/gpt-4o-mini',
    rag_config={
        'filters': {
            'source': 'documentation',
            'language': 'en'
        }
    }
)

Custom Embeddings

Use your own embedding function:
from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')

def custom_embed(text: str) -> list[float]:
    return model.encode(text).tolist()

adapter = GenericRAGAdapter(
    vector_store=vector_store,
    llm_model='openai/gpt-4o-mini',
    embedding_function=custom_embed
)

Complete Example

import gepa
from gepa.adapters.generic_rag_adapter import GenericRAGAdapter
from gepa.adapters.generic_rag_adapter.vector_stores.chroma_store import ChromaVectorStore

# 1. Setup vector store
vector_store = ChromaVectorStore.create_local(
    persist_directory='./kb',
    collection_name='tech_docs'
)

# 2. Add documents
documents = [
    {
        'id': 'doc1',
        'content': 'Machine learning is a method of data analysis...',
        'metadata': {'category': 'AI', 'date': '2024-01-01'}
    },
    # ... 100+ documents
]
vector_store.add_documents(documents)

# 3. Create adapter
adapter = GenericRAGAdapter(
    vector_store=vector_store,
    llm_model='openai/gpt-4o-mini',
    rag_config={
        'retrieval_strategy': 'hybrid',
        'top_k': 5,
        'retrieval_weight': 0.3,
        'generation_weight': 0.7
    }
)

# 4. Prepare dataset
train_data = [
    {
        'query': 'What is machine learning?',
        'ground_truth_answer': 'Machine learning is a method of data analysis that automates analytical model building.',
        'relevant_doc_ids': ['doc1'],
        'metadata': {}
    },
    # ... 50+ examples
]

# 5. Optimize all components
result = gepa.optimize(
    seed_candidate={
        'query_reformulation': 'Rephrase to improve retrieval.',
        'context_synthesis': 'Combine documents into context.',
        'answer_generation': 'Answer based on context.',
        'reranking_criteria': 'Rank by relevance.'
    },
    trainset=train_data[:30],
    valset=train_data[30:],
    adapter=adapter,
    max_metric_calls=100,
    reflection_lm='openai/gpt-4'
)

# 6. Deploy optimized RAG system
optimized_prompts = result.best_candidate
print('Query Reformulation:', optimized_prompts['query_reformulation'])
print('Answer Generation:', optimized_prompts['answer_generation'])
print('Validation Score:', result.best_score)

Best Practices

  1. Document Quality: Ensure documents are well-formatted and contain relevant information
  2. Dataset Size: Use 30+ train examples and 20+ validation examples
  3. Relevant Doc IDs: Provide accurate relevant_doc_ids for retrieval evaluation
  4. Component Selection: Start with answer_generation, then add others
  5. Metadata: Use metadata filters when documents have clear categories
  6. Vector Store Choice: Use cloud providers for production, local for development

Performance Tips

  • Batch Size: Process 5-10 examples per batch for memory efficiency
  • Top-K: Start with 5, increase if answers lack context
  • Hybrid Alpha: 0.7 works well for most cases (70% semantic, 30% keyword)
  • Caching: Vector stores cache embeddings automatically

Limitations

  • Requires pre-populated vector store
  • Document embeddings not optimized (only prompts)
  • Single-turn queries only (no conversation context)
  • Metadata filters are static (not optimized)

See Also

Build docs developers (and LLMs) love