MCPEnv
An environment that exposes MCP server tools to language models using the official MCP SDK.
MCPEnv is experimental and subject to breaking changes. The API may change in future releases.
Overview
MCPEnv connects to MCP servers and exposes their tools to the model as callable functions. It manages:
- MCP server lifecycle (connection, tool discovery, cleanup)
- Persistent background event loops for server processes
- Tool call routing and error handling
- Concurrent multi-server support
MCPEnv is designed for globally available, read-only MCP servers where the same toolset can be shared across all rollouts. For per-rollout, stateful servers with mutable task-specific data, consider using a custom environment.
Inheritance
Environment
└── MultiTurnEnv
└── ToolEnv
└── MCPEnv
Constructor
MCPEnv(
mcp_servers: list[MCPServerConfig | dict] = [],
max_turns: int = 10,
error_formatter: Callable[[Exception], str] = lambda e: f"Error: {str(e)}",
**kwargs
)
mcp_servers
list[MCPServerConfig | dict]
default:"[]"
required
List of MCP server configurations. Can be MCPServerConfig objects or dicts with keys: name, command, args, env, description.
Maximum turns per rollout. Inherited from ToolEnv.
error_formatter
Callable[[Exception], str]
Function to format tool execution errors for the model.
Additional arguments passed to ToolEnv (dataset, rubric, system_prompt, etc.).
MCPServerConfig
@dataclass
class MCPServerConfig:
name: str
command: str
args: list[str] | None = None
env: dict[str, str] | None = None
description: str = ""
Unique identifier for the server.
Executable command to start the MCP server (e.g., "uvx", "npx", "python").
Command-line arguments for the server.
Environment variables to pass to the server process.
Human-readable description of the server’s purpose.
Example Usage
Basic Setup
import verifiers as vf
from verifiers.envs.experimental.mcp_env import MCPServerConfig
def load_environment():
# Configure MCP servers
servers = [
MCPServerConfig(
name="filesystem",
command="npx",
args=["-y", "@modelcontextprotocol/server-filesystem", "/tmp"],
description="File system operations"
),
MCPServerConfig(
name="search",
command="uvx",
args=["mcp-server-brave-search"],
env={"BRAVE_API_KEY": "your-key"},
description="Web search via Brave"
),
]
# Create dataset
dataset = vf.Environment.make_dataset([
{"question": "Search for recent news about AI"},
{"question": "List files in /tmp"},
])
def task_completed(completion: vf.Messages) -> float:
"""Simple completion reward."""
return 1.0 if len(completion) > 0 else 0.0
return vf.MCPEnv(
mcp_servers=servers,
dataset=dataset,
rubric=vf.Rubric(task_completed),
max_turns=5,
)
Using Dict Configs
import verifiers as vf
def load_environment():
servers = [
{
"name": "github",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"],
"env": {"GITHUB_TOKEN": "ghp_..."},
},
]
dataset = vf.Environment.make_dataset([
{"question": "List issues in repository owner/repo"},
])
return vf.MCPEnv(
mcp_servers=servers,
dataset=dataset,
rubric=vf.Rubric(lambda **kw: 1.0),
)
MCPEnv automatically:
- Connects to each server via stdio
- Calls
list_tools() to discover available tools
- Wraps each tool in an
MCPToolWrapper instance
- Converts MCP tool schemas to
vf.Tool format
- Registers tools with the environment
Tools are available to the model immediately after initialization.
Error Handling
Tool execution errors are caught and returned as error messages:
# Custom error formatting
def format_error(e: Exception) -> str:
return f"Tool failed: {type(e).__name__}: {str(e)}"
env = vf.MCPEnv(
mcp_servers=[...],
error_formatter=format_error,
)
Lifecycle Management
MCPEnv runs MCP servers in a persistent background event loop that starts during __init__ and automatically cleans up on exit.
Server Connection
- Servers connect during environment initialization (blocking)
- Connection failures raise immediately
- Tools are registered once servers are ready
Cleanup
Cleanup is automatic via atexit hooks:
# Servers are disconnected when:
# 1. Python process exits
# 2. Environment is garbage collected
# 3. Manually via await env.cleanup()
env = vf.MCPEnv(mcp_servers=[...])
# ... use environment ...
await env.cleanup() # Optional: explicit cleanup
Multi-Server Example
import verifiers as vf
from verifiers.envs.experimental.mcp_env import MCPServerConfig
def load_environment():
servers = [
MCPServerConfig(
name="web",
command="uvx",
args=["mcp-server-fetch"],
description="Web page fetching"
),
MCPServerConfig(
name="memory",
command="npx",
args=["-y", "@modelcontextprotocol/server-memory"],
description="Knowledge graph memory"
),
MCPServerConfig(
name="postgres",
command="npx",
args=["-y", "@modelcontextprotocol/server-postgres"],
env={"POSTGRES_URL": "postgresql://..."},
description="Database queries"
),
]
dataset = vf.Environment.make_dataset([
{"question": "Research topic X and store findings in memory"},
{"question": "Query the database for recent entries"},
])
return vf.MCPEnv(
mcp_servers=servers,
dataset=dataset,
rubric=vf.Rubric(lambda **kw: 1.0),
max_turns=15,
)
MCP tools are automatically converted to Verifiers tool format:
# MCP tool schema
{
"name": "read_file",
"description": "Read file contents",
"inputSchema": {
"type": "object",
"properties": {
"path": {"type": "string"}
},
"required": ["path"]
}
}
# Converted to vf.Tool
Tool(
name="read_file",
description="Read file contents",
parameters={
"type": "object",
"properties": {"path": {"type": "string"}},
"required": ["path"]
}
)
Debugging
Enable detailed MCP logging:
import logging
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger("verifiers.envs.experimental.mcp_env")
logger.setLevel(logging.DEBUG)
env = vf.MCPEnv(mcp_servers=[...])
# Logs server connections, tool registrations, and calls
Limitations
- Global servers only: Not designed for per-rollout stateful servers
- Stdio only: Uses stdio transport (not SSE or other protocols)
- No streaming: Tool results are returned as complete strings
- Single event loop: All servers share one background event loop
When to Use
Use MCPEnv when:
- You have existing MCP servers with read-only tools
- Tools can be shared across all rollouts
- You need multi-server tool composition
For stateful, per-rollout tools, use StatefulToolEnv instead.
See Also