Skip to main content

Overview

The MCPAdapter enables GEPA to optimize Model Context Protocol (MCP) tool usage. It supports:
  • Local servers via stdio (Python, Node.js)
  • Remote servers via SSE or StreamableHTTP
  • Multi-tool optimization across multiple tools simultaneously
  • Two-pass workflow for better tool integration
  • Tool description optimization to improve model understanding
  • System prompt optimization for better tool usage guidance

Installation

pip install gepa mcp

Quick Start

Local Server (Ollama - No Cost)

import gepa
from gepa.adapters.mcp_adapter import MCPAdapter
from mcp import StdioServerParameters

# Configure local MCP filesystem server
server_params = StdioServerParameters(
    command="npx",
    args=["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
)

# Create adapter with LOCAL Ollama model (FREE)
adapter = MCPAdapter(
    server_params=server_params,
    tool_names=["read_file", "write_file", "list_files"],
    task_model="ollama/llama3.2:1b",  # Local model
    metric_fn=lambda item, output: 1.0 if item["reference_answer"] in output else 0.0
)

# Prepare dataset
dataset = [
    {
        "user_query": "What's in notes.txt?",
        "tool_arguments": {"path": "/tmp/notes.txt"},
        "reference_answer": "Meeting at 3pm",
        "additional_context": {}
    },
    # ... more examples
]

# Optimize with local models - $0.00 cost!
result = gepa.optimize(
    seed_candidate={"tool_description": "Read file contents from disk"},
    trainset=dataset[:20],
    valset=dataset[20:],
    adapter=adapter,
    reflection_lm="ollama/llama3.1:8b",  # Larger local model for reflection
    max_metric_calls=150
)

print("Optimized tool description:", result.best_candidate["tool_description"])

OpenAI API

adapter = MCPAdapter(
    server_params=server_params,
    tool_names="read_file",
    task_model="openai/gpt-4o-mini",  # OpenAI model
    metric_fn=my_metric
)

result = gepa.optimize(
    seed_candidate={"tool_description": "Read file contents"},
    trainset=dataset[:20],
    valset=dataset[20:],
    adapter=adapter,
    reflection_lm="openai/gpt-4",
    max_metric_calls=150
)

Class Signature

Defined in src/gepa/adapters/mcp_adapter/mcp_adapter.py:94:
class MCPAdapter(GEPAAdapter[MCPDataInst, MCPTrajectory, MCPOutput]):
    def __init__(
        self,
        tool_names: str | list[str],
        task_model: str | Callable,
        metric_fn: Callable[[MCPDataInst, str], float],
        # Local server configuration
        server_params: StdioServerParameters | None = None,
        # Remote server configuration
        remote_url: str | None = None,
        remote_transport: str = "sse",
        remote_headers: dict[str, str] | None = None,
        remote_timeout: float = 30,
        # Adapter configuration
        base_system_prompt: str = "You are a helpful assistant with access to tools.",
        enable_two_pass: bool = True,
        failure_score: float = 0.0,
    )

Parameters

tool_names
str | list[str]
required
Name(s) of tool(s) to optimize:
  • Single tool: "read_file"
  • Multiple tools: ["read_file", "write_file", "list_files"]
task_model
str | Callable
required
Model for task execution:
  • LiteLLM string: "openai/gpt-4o-mini", "ollama/llama3.2:1b"
  • Custom callable: (messages: list[dict]) -> str
metric_fn
Callable[[MCPDataInst, str], float]
required
Scoring function:
def metric(item: MCPDataInst, output: str) -> float:
    return 1.0 if item["reference_answer"] in output else 0.0
server_params
StdioServerParameters | None
default:"None"
Local MCP server configuration (stdio transport):
from mcp import StdioServerParameters

server_params = StdioServerParameters(
    command="python",
    args=["my_server.py"]
)
Required if not using remote server.
remote_url
str | None
default:"None"
Remote MCP server URL:
  • SSE: "https://mcp-server.com/sse"
  • HTTP: "https://mcp-server.com/mcp"
Required if not using local server.
remote_transport
str
default:"'sse'"
Remote transport protocol:
  • "sse": Server-Sent Events (streaming)
  • "streamable_http": HTTP with session management
remote_headers
dict[str, str] | None
default:"None"
HTTP headers for remote servers:
remote_headers = {
    "Authorization": "Bearer YOUR_TOKEN",
    "X-API-Key": "your-key"
}
remote_timeout
float
default:"30"
Timeout for remote HTTP operations (seconds).
base_system_prompt
str
Base system prompt template. Can be optimized if included in seed_candidate.
enable_two_pass
bool
default:"True"
Use two-pass workflow:
  • Pass 1: Model calls tool
  • Pass 2: Model uses tool response to generate final answer
Disable for simpler single-pass evaluation.
failure_score
float
default:"0.0"
Score assigned when evaluation fails (tool errors, parsing errors).

Data Types

MCPDataInst

Input data structure (src/gepa/adapters/mcp_adapter/mcp_adapter.py:34):
class MCPDataInst(TypedDict):
    user_query: str                 # User's question or request
    tool_arguments: dict[str, Any]  # Expected tool arguments
    reference_answer: str | None    # Reference answer for scoring
    additional_context: dict[str, str]  # Optional context

MCPTrajectory

Execution trace (src/gepa/adapters/mcp_adapter/mcp_adapter.py:51):
class MCPTrajectory(TypedDict):
    user_query: str                 # Original query
    tool_names: list[str]           # Available tools
    selected_tool: str | None       # Tool selected by model
    tool_called: bool               # Whether tool was called
    tool_arguments: dict | None     # Arguments passed to tool
    tool_response: str | None       # Tool's response
    tool_description_used: str      # Tool description
    system_prompt_used: str         # System prompt
    model_first_pass_output: str    # Model's first response
    model_final_output: str         # Model's final answer
    score: float                    # Evaluation score

MCPOutput

Final output (src/gepa/adapters/mcp_adapter/mcp_adapter.py:72):
class MCPOutput(TypedDict):
    final_answer: str               # Final answer from model
    tool_called: bool               # Whether tool was called
    selected_tool: str | None       # Which tool was selected
    tool_response: str | None       # Tool's response

Multi-Tool Support

Single Tool

adapter = MCPAdapter(
    tool_names="read_file",  # Single tool
    task_model="openai/gpt-4o-mini",
    metric_fn=my_metric,
    server_params=server_params
)

seed_candidate = {
    "tool_description": "Read file contents from disk"
}

Multiple Tools

adapter = MCPAdapter(
    tool_names=["read_file", "write_file", "list_files"],  # Multiple tools
    task_model="openai/gpt-4o-mini",
    metric_fn=my_metric,
    server_params=server_params
)

seed_candidate = {
    "tool_description_read_file": "Read file contents",
    "tool_description_write_file": "Write content to file",
    "tool_description_list_files": "List files in directory"
}

# GEPA will optimize each tool description independently

Optimizable Components

Tool Description

Optimizes the tool’s description field:
seed_candidate = {
    "tool_description": "Read file contents"
}

# GEPA might evolve this to:
# "Read file contents from the filesystem. Use when user asks to view,
#  show, or display file contents. Returns the full text content of the
#  specified file path. Requires 'path' parameter with absolute or relative
#  file path."

System Prompt

Optimizes guidance for tool usage:
seed_candidate = {
    "tool_description": "Read file contents",
    "system_prompt": "You are a helpful assistant with file access."
}

# GEPA optimizes both jointly

Two-Pass Workflow

The adapter uses a sophisticated two-pass approach:

Pass 1: Tool Call Decision

  1. Model receives user query and tool information
  2. Model decides whether to call tool
  3. If yes: Model outputs JSON with tool name and arguments
  4. Tool is executed, response captured

Pass 2: Final Answer Generation

  1. Model receives original query + tool response
  2. Model generates final answer incorporating tool results
  3. Final answer is evaluated against reference

Expected JSON Format

Model should respond with: Call tool:
{"action": "call_tool", "tool": "read_file", "arguments": {"path": "/tmp/notes.txt"}}
Direct answer:
{"action": "answer", "text": "The answer is..."}

Disabling Two-Pass

For simpler single-pass evaluation:
adapter = MCPAdapter(
    tool_names="my_tool",
    task_model="openai/gpt-4o-mini",
    metric_fn=my_metric,
    server_params=server_params,
    enable_two_pass=False  # Single pass only
)

Methods

evaluate()

Evaluates candidate on batch using MCP tools.
def evaluate(
    self,
    batch: list[MCPDataInst],
    candidate: dict[str, str],
    capture_traces: bool = False,
) -> EvaluationBatch[MCPTrajectory, MCPOutput]
Implementation: src/gepa/adapters/mcp_adapter/mcp_adapter.py:187

Behavior

  1. Creates MCP client session (local or remote)
  2. Retrieves available tools from server
  3. For each example:
    • First pass: Model calls tool (if needed)
    • Second pass: Model generates final answer (if two-pass enabled)
  4. Scores outputs using metric_fn
  5. Captures trajectories if capture_traces=True
  6. Closes MCP session

make_reflective_dataset()

Builds reflective dataset for instruction refinement.
def make_reflective_dataset(
    self,
    candidate: dict[str, str],
    eval_batch: EvaluationBatch[MCPTrajectory, MCPOutput],
    components_to_update: list[str],
) -> dict[str, list[dict[str, Any]]]
Implementation: src/gepa/adapters/mcp_adapter/mcp_adapter.py:602

Returns

{
    "tool_description": [
        {
            "Inputs": {
                "user_query": "What's in config.json?",
                "tool_description": "Read file contents"
            },
            "Generated Outputs": {
                "tool_called": True,
                "selected_tool": "read_file",
                "tool_arguments": {"path": "config.json"},
                "final_answer": "The config file contains..."
            },
            "Feedback": "Good! Tool was used appropriately. Score: 0.85"
        },
        # ... more examples
    ]
}

Local Server Examples

Filesystem Server (Node.js)

from mcp import StdioServerParameters

server_params = StdioServerParameters(
    command="npx",
    args=["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
)

adapter = MCPAdapter(
    server_params=server_params,
    tool_names=["read_file", "write_file", "list_files"],
    task_model="ollama/llama3.2:1b",
    metric_fn=my_metric
)

Custom Python Server

Create my_server.py:
from mcp.server.fastmcp import FastMCP

mcp = FastMCP("MyServer")

@mcp.tool()
def search_docs(query: str) -> str:
    """Search documentation."""
    # Your search logic
    return f"Results for: {query}"

if __name__ == "__main__":
    mcp.run()
Use in GEPA:
server_params = StdioServerParameters(
    command="python",
    args=["my_server.py"]
)

adapter = MCPAdapter(
    server_params=server_params,
    tool_names="search_docs",
    task_model="openai/gpt-4o-mini",
    metric_fn=my_metric
)

Remote Server Examples

Public SSE Server

adapter = MCPAdapter(
    tool_names="search_web",
    task_model="openai/gpt-4o-mini",
    metric_fn=my_metric,
    remote_url="https://public-mcp.example.com/sse",
    remote_transport="sse"
)

Authenticated HTTP Server

adapter = MCPAdapter(
    tool_names=["analyze_data", "visualize_data"],
    task_model="openai/gpt-4o-mini",
    metric_fn=my_metric,
    remote_url="https://internal-mcp.company.com/mcp",
    remote_transport="streamable_http",
    remote_headers={
        "Authorization": "Bearer YOUR_API_TOKEN",
        "X-Custom-Header": "value"
    },
    remote_timeout=60
)

Custom Metric Functions

Exact Match

def exact_match(item: MCPDataInst, output: str) -> float:
    return 1.0 if item["reference_answer"] in output else 0.0

Fuzzy Matching

from difflib import SequenceMatcher

def fuzzy_match(item: MCPDataInst, output: str) -> float:
    ratio = SequenceMatcher(
        None,
        item["reference_answer"],
        output
    ).ratio()
    return ratio  # 0.0 to 1.0

LLM-as-Judge

import litellm

def llm_judge(item: MCPDataInst, output: str) -> float:
    messages = [{
        "role": "user",
        "content": f"Rate this answer (0-1):\n"
                   f"Question: {item['user_query']}\n"
                   f"Reference: {item['reference_answer']}\n"
                   f"Answer: {output}"
    }]
    response = litellm.completion(
        model="openai/gpt-4o",
        messages=messages
    )
    return float(response.choices[0].message.content)

Complete Example

import gepa
from gepa.adapters.mcp_adapter import MCPAdapter
from mcp import StdioServerParameters

# 1. Configure MCP server
server_params = StdioServerParameters(
    command="npx",
    args=["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
)

# 2. Prepare dataset
dataset = [
    {
        "user_query": "What's in notes.txt?",
        "tool_arguments": {"path": "/tmp/notes.txt"},
        "reference_answer": "Meeting at 3pm",
        "additional_context": {}
    },
    {
        "user_query": "List files in the config directory",
        "tool_arguments": {"path": "/tmp/config"},
        "reference_answer": "config.json",
        "additional_context": {}
    },
    # ... 30+ examples
]

# 3. Create adapter
adapter = MCPAdapter(
    server_params=server_params,
    tool_names=["read_file", "write_file", "list_files"],
    task_model="ollama/llama3.2:1b",  # FREE local model
    metric_fn=lambda item, output: 1.0 if item["reference_answer"] in output else 0.0,
    enable_two_pass=True
)

# 4. Optimize
result = gepa.optimize(
    seed_candidate={
        "tool_description_read_file": "Read file contents",
        "tool_description_write_file": "Write to file",
        "tool_description_list_files": "List files"
    },
    trainset=dataset[:20],
    valset=dataset[20:],
    adapter=adapter,
    reflection_lm="ollama/llama3.1:8b",  # FREE local reflection
    max_metric_calls=150
)

# 5. Review results
print("\nOptimized Tool Descriptions:")
for tool in ["read_file", "write_file", "list_files"]:
    key = f"tool_description_{tool}"
    print(f"\n{tool}:")
    print(result.best_candidate[key])

print(f"\nValidation Score: {result.best_score:.2f}")

Best Practices

  1. Dataset Quality: Provide 20+ examples covering different tool usage scenarios
  2. Tool Names: Use descriptive tool names that hint at functionality
  3. Reference Answers: Include key information expected in final answers
  4. Multi-Tool: When optimizing multiple tools, ensure examples use different tools
  5. Local Development: Use Ollama for free local development
  6. Production: Use OpenAI/Anthropic for production workloads

Performance Notes

Subprocess Overhead

Each evaluate() call spawns a new MCP server process:
  • Startup time: ~100-500ms per evaluation
  • Total overhead for 150 evaluations: ~15-75 seconds
This is expected behavior in the current implementation. Future improvements planned:
  • Session pooling (reuse processes)
  • Background event loop (persistent session)
  • Async GEPA core (native async support)

Cost Optimization

Free (Ollama):
# Total cost: $0.00
adapter = MCPAdapter(
    tool_names="my_tool",
    task_model="ollama/llama3.2:1b",  # Local
    ...
)
result = gepa.optimize(
    ...,
    reflection_lm="ollama/llama3.1:8b"  # Local
)
Low Cost (OpenAI):
# ~$0.50 for 150 evaluations
adapter = MCPAdapter(
    tool_names="my_tool",
    task_model="openai/gpt-4o-mini",  # $0.15/1M tokens
    ...
)
result = gepa.optimize(
    ...,
    reflection_lm="openai/gpt-4"  # For proposal only
)

Troubleshooting

Tool Not Found

# Error: Tools ['my_tool'] not found
# Solution: Check tool name matches server

# List available tools
import asyncio
from gepa.adapters.mcp_adapter.mcp_client import create_mcp_client

async def list_tools():
    client = create_mcp_client(server_params=server_params)
    await client.start()
    await client.initialize()
    tools = await client.list_tools()
    print([t['name'] for t in tools])
    await client.close()

asyncio.run(list_tools())

Model Not Calling Tool

Check system prompt includes clear instructions:
adapter = MCPAdapter(
    tool_names="read_file",
    task_model="openai/gpt-4o-mini",
    metric_fn=my_metric,
    server_params=server_params,
    base_system_prompt="You are a helpful assistant. When asked about file contents, use the read_file tool."
)

JSON Parsing Errors

Model might not follow JSON format. Add to seed candidate:
seed_candidate = {
    "tool_description": "Read file contents",
    "system_prompt": "Always respond with valid JSON. No other text."
}

See Also

Build docs developers (and LLMs) love