Generic RAG Adapter

Overview

The GenericRAGAdapter enables GEPA to optimize RAG (Retrieval-Augmented Generation) systems with any vector store implementation. It evaluates both retrieval and generation quality, optimizing:

Query reformulation prompts
Context synthesis prompts
Answer generation prompts
Document reranking criteria

Supports ChromaDB, Weaviate, Qdrant, Pinecone, Milvus, LanceDB and custom vector stores.

Installation

pip install gepa

# Install your vector store
pip install chromadb  # or weaviate-client, qdrant-client, etc.

Quick Start

import gepa
from gepa.adapters.generic_rag_adapter import GenericRAGAdapter
from gepa.adapters.generic_rag_adapter.vector_stores.chroma_store import ChromaVectorStore

# Create vector store
vector_store = ChromaVectorStore.create_local(
    persist_directory='./my_kb',
    collection_name='documents'
)

# Add documents
vector_store.add_documents([
    {'id': 'doc1', 'content': 'Machine learning is...', 'metadata': {}},
    # ... more documents
])

# Create adapter
adapter = GenericRAGAdapter(
    vector_store=vector_store,
    llm_model='openai/gpt-4o-mini',
    rag_config={
        'retrieval_strategy': 'similarity',
        'top_k': 5
    }
)

# Prepare dataset
train_data = [
    {
        'query': 'What is machine learning?',
        'ground_truth_answer': 'Machine learning is a subset of AI...',
        'relevant_doc_ids': ['doc1'],
        'metadata': {}
    },
    # ... more examples
]

# Optimize
result = gepa.optimize(
    seed_candidate={
        'answer_generation': 'Answer based on the context provided.'
    },
    trainset=train_data[:20],
    valset=train_data[20:],
    adapter=adapter,
    max_metric_calls=50,
    reflection_lm='openai/gpt-4'
)

print('Optimized prompts:', result.best_candidate)

Class Signature

Defined in src/gepa/adapters/generic_rag_adapter/generic_rag_adapter.py:106:

class GenericRAGAdapter(GEPAAdapter[RAGDataInst, RAGTrajectory, RAGOutput]):
    def __init__(
        self,
        vector_store: VectorStoreInterface,
        llm_model,
        embedding_model: str = 'text-embedding-3-small',
        embedding_function=None,
        rag_config: dict[str, Any] | None = None,
        failure_score: float = 0.0,
    )

Parameters

vector_store

VectorStoreInterface

required

Vector store implementation. Must implement VectorStoreInterface:

ChromaDB: ChromaVectorStore
Weaviate: WeaviateVectorStore
Qdrant: QdrantVectorStore
Milvus: MilvusVectorStore
LanceDB: LanceDBVectorStore
Custom: Implement VectorStoreInterface

llm_model

str | Callable

required

LLM for text generation. Can be:

LiteLLM model string (e.g., 'openai/gpt-4o-mini')
Custom callable taking messages and returning response

embedding_model

str

default:"'text-embedding-3-small'"

Model name for text embeddings. Used when embedding_function not provided.

embedding_function

Callable | None

default:"None"

Custom embedding function (text: str) -> list[float]. If None, uses LiteLLM embeddings.

rag_config

dict[str, Any] | None

default:"None"

RAG pipeline configuration:

retrieval_strategy: 'similarity', 'hybrid', or 'vector' (default: 'similarity')
top_k: Number of documents to retrieve (default: 5)
retrieval_weight: Weight for retrieval in combined score (default: 0.3)
generation_weight: Weight for generation in combined score (default: 0.7)
hybrid_alpha: Semantic vs keyword balance for hybrid search (default: 0.5)
filters: Default metadata filters for retrieval

failure_score

float

default:"0.0"

Score assigned when evaluation fails.

Data Types

RAGDataInst

Input data structure (src/gepa/adapters/generic_rag_adapter/generic_rag_adapter.py:12):

class RAGDataInst(TypedDict):
    query: str                      # User query
    ground_truth_answer: str        # Expected answer
    relevant_doc_ids: list[str]     # Document IDs that should be retrieved
    metadata: dict[str, Any]        # Additional context

RAGTrajectory

Execution trace (src/gepa/adapters/generic_rag_adapter/generic_rag_adapter.py:44):

class RAGTrajectory(TypedDict):
    original_query: str             # Original user query
    reformulated_query: str         # Query after reformulation
    retrieved_docs: list[dict]      # Retrieved documents with scores
    synthesized_context: str        # Context after synthesis
    generated_answer: str           # Final answer
    execution_metadata: dict        # Metrics and performance data

RAGOutput

Final output (src/gepa/adapters/generic_rag_adapter/generic_rag_adapter.py:74):

class RAGOutput(TypedDict):
    final_answer: str               # Generated answer
    confidence_score: float         # Confidence (0.0 to 1.0)
    retrieved_docs: list[dict]      # Retrieved documents
    total_tokens: int               # Token usage

Optimizable Components

1. Query Reformulation

Improves query understanding and reformulation:

seed_candidate = {
    'query_reformulation': 'Rephrase the query to improve retrieval.'
}

# GEPA might evolve this to:
# 'Extract key entities and concepts from the query. Expand abbreviations.
#  Add relevant synonyms. Focus on information-seeking intent.'

2. Context Synthesis

Optimizes document combination and summarization:

seed_candidate = {
    'context_synthesis': 'Combine the retrieved documents into a coherent context.'
}

# GEPA might evolve this to:
# 'Synthesize information from retrieved documents by:
#  1. Identifying common themes
#  2. Removing redundant information
#  3. Organizing by relevance to query
#  4. Preserving key facts and relationships'

3. Answer Generation

Enhances final answer quality:

seed_candidate = {
    'answer_generation': 'Answer the question based on the context.'
}

# GEPA might evolve this to:
# 'Generate a comprehensive answer that:
#  1. Directly addresses the question
#  2. Cites specific information from context
#  3. Acknowledges any uncertainties
#  4. Uses clear, concise language'

4. Reranking Criteria

Improves document relevance ordering:

seed_candidate = {
    'reranking_criteria': 'Rank documents by relevance to query.'
}

# GEPA might evolve this to:
# 'Rerank documents by:
#  1. Semantic similarity to query intent
#  2. Presence of query entities
#  3. Information density
#  4. Recency (if metadata available)'

Vector Store Setup

ChromaDB (Local)

from gepa.adapters.generic_rag_adapter.vector_stores.chroma_store import ChromaVectorStore

vector_store = ChromaVectorStore.create_local(
    persist_directory='./my_knowledge_base',
    collection_name='documents'
)

# Add documents
vector_store.add_documents([
    {
        'id': 'doc1',
        'content': 'Your document text here...',
        'metadata': {'source': 'manual', 'date': '2024-01-01'}
    },
    # ... more documents
])

Weaviate (Cloud)

from gepa.adapters.generic_rag_adapter.vector_stores.weaviate_store import WeaviateVectorStore

vector_store = WeaviateVectorStore.create_cloud(
    cluster_url='https://your-cluster.weaviate.network',
    api_key='your-api-key',
    collection_name='Documents'
)

vector_store.add_documents([...])

Qdrant (Self-hosted)

from gepa.adapters.generic_rag_adapter.vector_stores.qdrant_store import QdrantVectorStore

vector_store = QdrantVectorStore(
    host='localhost',
    port=6333,
    collection_name='my_documents'
)

vector_store.add_documents([...])

Custom Vector Store

Implement VectorStoreInterface:

from gepa.adapters.generic_rag_adapter.vector_store_interface import VectorStoreInterface

class MyVectorStore(VectorStoreInterface):
    def search(
        self,
        query_embedding: list[float],
        top_k: int = 5,
        filters: dict | None = None
    ) -> list[dict[str, Any]]:
        # Your search implementation
        return results
    
    def add_documents(self, documents: list[dict[str, Any]]) -> None:
        # Your add implementation
        pass
    
    def get_document(self, doc_id: str) -> dict[str, Any] | None:
        # Your get implementation
        return document

Methods

evaluate()

Evaluates RAG system on a batch of queries.

def evaluate(
    self,
    batch: list[RAGDataInst],
    candidate: dict[str, str],
    capture_traces: bool = False,
) -> EvaluationBatch[RAGTrajectory, RAGOutput]

Implementation: src/gepa/adapters/generic_rag_adapter/generic_rag_adapter.py:205

Behavior

Executes RAG pipeline for each query with candidate prompts
Evaluates retrieval quality (precision, recall, F1, MRR)
Evaluates generation quality (token F1, BLEU, faithfulness)
Computes combined score (weighted)
Returns EvaluationBatch with outputs, scores, and optional trajectories

make_reflective_dataset()

Generates reflective dataset for prompt improvement.

def make_reflective_dataset(
    self,
    candidate: dict[str, str],
    eval_batch: EvaluationBatch[RAGTrajectory, RAGOutput],
    components_to_update: list[str],
) -> dict[str, list[dict[str, Any]]]

Implementation: src/gepa/adapters/generic_rag_adapter/generic_rag_adapter.py:337

Returns

Component-specific reflective examples:

{
    'answer_generation': [
        {
            'Inputs': {
                'query': 'What is ML?',
                'context': 'Machine learning is...'
            },
            'Generated Outputs': 'ML is a type of AI...',
            'Feedback': 'Good answer generation. Score: 0.85'
        },
        # ... more examples
    ]
}

Evaluation Metrics

Retrieval Metrics

Precision: % of retrieved docs that are relevant
Recall: % of relevant docs that were retrieved
F1 Score: Harmonic mean of precision and recall
MRR (Mean Reciprocal Rank): 1 / (rank of first relevant doc)

Generation Metrics

Token F1: Overlap between generated and ground truth tokens
BLEU Score: N-gram overlap
Faithfulness: Answer is supported by retrieved context
Answer Relevance: Answer addresses the query

Combined Score

combined_score = (
    retrieval_weight * retrieval_f1 +
    generation_weight * generation_score
)

Default: 30% retrieval, 70% generation.

Advanced Configuration

Hybrid Search

Combine semantic and keyword search:

adapter = GenericRAGAdapter(
    vector_store=vector_store,
    llm_model='openai/gpt-4o-mini',
    rag_config={
        'retrieval_strategy': 'hybrid',
        'hybrid_alpha': 0.7,  # 70% semantic, 30% keyword
        'top_k': 10
    }
)

Custom Weights

Adjust retrieval vs generation importance:

adapter = GenericRAGAdapter(
    vector_store=vector_store,
    llm_model='openai/gpt-4o-mini',
    rag_config={
        'retrieval_weight': 0.5,  # 50% retrieval
        'generation_weight': 0.5,  # 50% generation
        'top_k': 5
    }
)

Metadata Filtering

Filter documents by metadata:

adapter = GenericRAGAdapter(
    vector_store=vector_store,
    llm_model='openai/gpt-4o-mini',
    rag_config={
        'filters': {
            'source': 'documentation',
            'language': 'en'
        }
    }
)

Custom Embeddings

Use your own embedding function:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')

def custom_embed(text: str) -> list[float]:
    return model.encode(text).tolist()

adapter = GenericRAGAdapter(
    vector_store=vector_store,
    llm_model='openai/gpt-4o-mini',
    embedding_function=custom_embed
)

Complete Example

import gepa
from gepa.adapters.generic_rag_adapter import GenericRAGAdapter
from gepa.adapters.generic_rag_adapter.vector_stores.chroma_store import ChromaVectorStore

# 1. Setup vector store
vector_store = ChromaVectorStore.create_local(
    persist_directory='./kb',
    collection_name='tech_docs'
)

# 2. Add documents
documents = [
    {
        'id': 'doc1',
        'content': 'Machine learning is a method of data analysis...',
        'metadata': {'category': 'AI', 'date': '2024-01-01'}
    },
    # ... 100+ documents
]
vector_store.add_documents(documents)

# 3. Create adapter
adapter = GenericRAGAdapter(
    vector_store=vector_store,
    llm_model='openai/gpt-4o-mini',
    rag_config={
        'retrieval_strategy': 'hybrid',
        'top_k': 5,
        'retrieval_weight': 0.3,
        'generation_weight': 0.7
    }
)

# 4. Prepare dataset
train_data = [
    {
        'query': 'What is machine learning?',
        'ground_truth_answer': 'Machine learning is a method of data analysis that automates analytical model building.',
        'relevant_doc_ids': ['doc1'],
        'metadata': {}
    },
    # ... 50+ examples
]

# 5. Optimize all components
result = gepa.optimize(
    seed_candidate={
        'query_reformulation': 'Rephrase to improve retrieval.',
        'context_synthesis': 'Combine documents into context.',
        'answer_generation': 'Answer based on context.',
        'reranking_criteria': 'Rank by relevance.'
    },
    trainset=train_data[:30],
    valset=train_data[30:],
    adapter=adapter,
    max_metric_calls=100,
    reflection_lm='openai/gpt-4'
)

# 6. Deploy optimized RAG system
optimized_prompts = result.best_candidate
print('Query Reformulation:', optimized_prompts['query_reformulation'])
print('Answer Generation:', optimized_prompts['answer_generation'])
print('Validation Score:', result.best_score)

Best Practices

Document Quality: Ensure documents are well-formatted and contain relevant information
Dataset Size: Use 30+ train examples and 20+ validation examples
Relevant Doc IDs: Provide accurate relevant_doc_ids for retrieval evaluation
Component Selection: Start with answer_generation, then add others
Metadata: Use metadata filters when documents have clear categories
Vector Store Choice: Use cloud providers for production, local for development

Performance Tips

Batch Size: Process 5-10 examples per batch for memory efficiency
Top-K: Start with 5, increase if answers lack context
Hybrid Alpha: 0.7 works well for most cases (70% semantic, 30% keyword)
Caching: Vector stores cache embeddings automatically

Limitations

Requires pre-populated vector store
Document embeddings not optimized (only prompts)
Single-turn queries only (no conversation context)
Metadata filters are static (not optimized)

Core API

Adapters

Configuration

Advanced

​Overview

​Installation

​Quick Start

​Class Signature

​Parameters

​Data Types

​RAGDataInst

​RAGTrajectory

​RAGOutput

​Optimizable Components

​1. Query Reformulation

​2. Context Synthesis

​3. Answer Generation

​4. Reranking Criteria

​Vector Store Setup

​ChromaDB (Local)

​Weaviate (Cloud)

​Qdrant (Self-hosted)

​Custom Vector Store

​Methods

​evaluate()

​Behavior

​make_reflective_dataset()

​Returns

​Evaluation Metrics

​Retrieval Metrics

​Generation Metrics

​Combined Score

​Advanced Configuration

​Hybrid Search

​Custom Weights

​Metadata Filtering

​Custom Embeddings

​Complete Example

​Best Practices

​Performance Tips

​Limitations

​See Also

Build docs developers (and LLMs) love