The MCPEnv integration allows you to connect to MCP (Model Context Protocol) servers and expose their tools to language models in Verifiers environments.
MCP provides a standardized way to connect AI models to external data sources and tools via a simple protocol.
Features
- Multiple MCP servers - Connect to multiple servers simultaneously
- Automatic tool discovery - Tools from servers are automatically exposed to models
- stdio transport - Communicates via standard input/output
- Type-safe - Preserves tool schemas and parameter types
- Built on ToolEnv - Inherits all ToolEnv features
Installation
MCP support is included in core Verifiers:
The MCP SDK is automatically installed as a dependency.
Quick Start
Create an environment
Create a basic MCP environment:import os
import verifiers as vf
from verifiers.envs.experimental.mcp_env import MCPEnv
from datasets import Dataset
def load_environment():
# Configure MCP servers
mcp_servers = [
{
"name": "fetch",
"command": "uvx",
"args": ["mcp-server-fetch"],
"description": "Fetch web content"
},
]
# Create dataset
dataset = Dataset.from_dict({
"question": [
"What is the latest news on OpenAI's website?",
],
"answer": ["Recent updates about GPT models"]
})
# Create rubric
rubric = vf.JudgeRubric(judge_model="gpt-4.1-mini")
async def judge_reward(judge, prompt, completion, answer):
response = await judge(prompt, completion, answer)
return 1.0 if "yes" in response.lower() else 0.0
rubric.add_reward_func(judge_reward)
# Create environment
return MCPEnv(
mcp_servers=mcp_servers,
dataset=dataset,
rubric=rubric,
max_turns=10,
)
Evaluate
Run an evaluation:prime eval run my-mcp-env -m openai/gpt-4.1-mini -n 5
MCP Server Configuration
Configure MCP servers using the MCPServerConfig format:
mcp_servers = [
{
"name": "fetch",
"command": "uvx",
"args": ["mcp-server-fetch"],
"description": "Fetch web content",
},
{
"name": "filesystem",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"],
"description": "File system access",
},
]
Configuration fields:
name - Identifier for the server
command - Command to launch the server
args - List of command arguments
env - Environment variables (optional)
description - Human-readable description (optional)
With Environment Variables
For servers requiring API keys:
import os
import verifiers as vf
def load_environment():
vf.ensure_keys(["EXA_API_KEY"]) # Validate key exists
mcp_servers = [
{
"name": "exa",
"command": "npx",
"args": ["-y", "exa-mcp-server"],
"env": {"EXA_API_KEY": os.environ["EXA_API_KEY"]},
"description": "Exa search",
},
]
return MCPEnv(
mcp_servers=mcp_servers,
dataset=dataset,
rubric=rubric,
)
Available MCP Servers
Common MCP servers you can use:
Web & Search
Fetch - Retrieve web content
{
"name": "fetch",
"command": "uvx",
"args": ["mcp-server-fetch"],
}
Exa - AI-powered search
{
"name": "exa",
"command": "npx",
"args": ["-y", "exa-mcp-server"],
"env": {"EXA_API_KEY": os.environ["EXA_API_KEY"]},
}
Brave Search - Web search
{
"name": "brave",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-brave-search"],
"env": {"BRAVE_API_KEY": os.environ["BRAVE_API_KEY"]},
}
File System
Filesystem - Read/write files
{
"name": "filesystem",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/path/to/dir"],
}
Databases
PostgreSQL - Query databases
{
"name": "postgres",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-postgres"],
"env": {"POSTGRES_URL": os.environ["POSTGRES_URL"]},
}
SQLite - Local database access
{
"name": "sqlite",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-sqlite", "database.db"],
}
Git - Repository operations
{
"name": "git",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-git"],
}
GitHub - GitHub API access
{
"name": "github",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"],
"env": {"GITHUB_TOKEN": os.environ["GITHUB_TOKEN"]},
}
See MCP servers directory for more servers.
Full Example
Here’s a complete example using multiple MCP servers:
import os
from datasets import Dataset
import verifiers as vf
from verifiers.envs.experimental.mcp_env import MCPEnv
def load_environment(
mcp_servers: list | None = None,
dataset=None,
**kwargs
) -> vf.Environment:
# Validate API keys
vf.ensure_keys(["EXA_API_KEY"])
# Configure MCP servers
if mcp_servers is None:
mcp_servers = [
{
"name": "exa",
"command": "npx",
"args": ["-y", "exa-mcp-server"],
"env": {"EXA_API_KEY": os.environ["EXA_API_KEY"]},
"description": "Exa AI search",
},
{
"name": "fetch",
"command": "uvx",
"args": ["mcp-server-fetch"],
"description": "Fetch web content",
},
]
# Create dataset
if dataset is None:
dataset = Dataset.from_dict({
"question": [
"Find the latest Prime Intellect announcement",
"What is the current weather in San Francisco?",
],
"answer": [
"Information about recent announcements",
"Current weather conditions",
]
})
# Create rubric with judge
rubric = vf.JudgeRubric(judge_model="gpt-4.1-mini")
async def judge_reward(judge, prompt, completion, answer, state):
verdict = await judge(prompt, completion, answer, state)
return 1.0 if "yes" in verdict.lower() else 0.0
rubric.add_reward_func(judge_reward, weight=1.0)
# Create MCP environment
return MCPEnv(
mcp_servers=mcp_servers,
dataset=dataset,
rubric=rubric,
max_turns=10,
**kwargs,
)
Error Handling
Configure error handling behavior:
def custom_error_formatter(error: Exception) -> str:
"""Format errors for the model."""
return f"Tool error: {str(error)[:100]}"
env = MCPEnv(
mcp_servers=mcp_servers,
dataset=dataset,
rubric=rubric,
error_formatter=custom_error_formatter,
)
Architecture Notes
MCPEnv is designed for globally available, read-only MCP servers where the same toolset can be shared across all rollouts. For servers requiring per-rollout state or mutable task-specific data, consider implementing a custom StatefulToolEnv subclass.
Connection Management
MCP servers are connected once during environment initialization and shared across all rollouts:
- Environment starts background event loop
- Connects to all configured MCP servers
- Discovers available tools via
tools/list
- Exposes tools to rollouts
- Cleanup on environment shutdown
When a model calls an MCP tool:
- Tool call is intercepted by
MCPEnv
- Request is sent to appropriate MCP server
- Response is returned as tool message
- Errors are formatted via
error_formatter
Best Practices
- Validate API keys - Use
vf.ensure_keys() to fail fast if keys are missing
- Document requirements - List required environment variables in README
- Test servers locally - Verify MCP servers work before using in environments
- Handle errors gracefully - Provide clear error messages via
error_formatter
- Limit tool calls - Set reasonable
max_turns to prevent infinite loops
Limitations
- MCP servers must support stdio transport
- Servers are started once per environment, not per rollout
- No support for resources or prompts (tools only)
- Limited to read-only operations (no per-rollout state)
Examples
See the mcp-search-env example in the Verifiers repository for a complete implementation.
Further Reading