Skip to main content

MCPEnv

An environment that exposes MCP server tools to language models using the official MCP SDK.
MCPEnv is experimental and subject to breaking changes. The API may change in future releases.

Overview

MCPEnv connects to MCP servers and exposes their tools to the model as callable functions. It manages:
  • MCP server lifecycle (connection, tool discovery, cleanup)
  • Persistent background event loops for server processes
  • Tool call routing and error handling
  • Concurrent multi-server support
MCPEnv is designed for globally available, read-only MCP servers where the same toolset can be shared across all rollouts. For per-rollout, stateful servers with mutable task-specific data, consider using a custom environment.

Inheritance

Environment
└── MultiTurnEnv
    └── ToolEnv
        └── MCPEnv

Constructor

MCPEnv(
    mcp_servers: list[MCPServerConfig | dict] = [],
    max_turns: int = 10,
    error_formatter: Callable[[Exception], str] = lambda e: f"Error: {str(e)}",
    **kwargs
)
mcp_servers
list[MCPServerConfig | dict]
default:"[]"
required
List of MCP server configurations. Can be MCPServerConfig objects or dicts with keys: name, command, args, env, description.
max_turns
int
default:"10"
Maximum turns per rollout. Inherited from ToolEnv.
error_formatter
Callable[[Exception], str]
Function to format tool execution errors for the model.
**kwargs
Additional arguments passed to ToolEnv (dataset, rubric, system_prompt, etc.).

MCPServerConfig

@dataclass
class MCPServerConfig:
    name: str
    command: str
    args: list[str] | None = None
    env: dict[str, str] | None = None
    description: str = ""
name
str
required
Unique identifier for the server.
command
str
required
Executable command to start the MCP server (e.g., "uvx", "npx", "python").
args
list[str] | None
Command-line arguments for the server.
env
dict[str, str] | None
Environment variables to pass to the server process.
description
str
Human-readable description of the server’s purpose.

Example Usage

Basic Setup

import verifiers as vf
from verifiers.envs.experimental.mcp_env import MCPServerConfig

def load_environment():
    # Configure MCP servers
    servers = [
        MCPServerConfig(
            name="filesystem",
            command="npx",
            args=["-y", "@modelcontextprotocol/server-filesystem", "/tmp"],
            description="File system operations"
        ),
        MCPServerConfig(
            name="search",
            command="uvx",
            args=["mcp-server-brave-search"],
            env={"BRAVE_API_KEY": "your-key"},
            description="Web search via Brave"
        ),
    ]
    
    # Create dataset
    dataset = vf.Environment.make_dataset([
        {"question": "Search for recent news about AI"},
        {"question": "List files in /tmp"},
    ])
    
    def task_completed(completion: vf.Messages) -> float:
        """Simple completion reward."""
        return 1.0 if len(completion) > 0 else 0.0
    
    return vf.MCPEnv(
        mcp_servers=servers,
        dataset=dataset,
        rubric=vf.Rubric(task_completed),
        max_turns=5,
    )

Using Dict Configs

import verifiers as vf

def load_environment():
    servers = [
        {
            "name": "github",
            "command": "npx",
            "args": ["-y", "@modelcontextprotocol/server-github"],
            "env": {"GITHUB_TOKEN": "ghp_..."},
        },
    ]
    
    dataset = vf.Environment.make_dataset([
        {"question": "List issues in repository owner/repo"},
    ])
    
    return vf.MCPEnv(
        mcp_servers=servers,
        dataset=dataset,
        rubric=vf.Rubric(lambda **kw: 1.0),
    )

Tool Discovery

MCPEnv automatically:
  1. Connects to each server via stdio
  2. Calls list_tools() to discover available tools
  3. Wraps each tool in an MCPToolWrapper instance
  4. Converts MCP tool schemas to vf.Tool format
  5. Registers tools with the environment
Tools are available to the model immediately after initialization.

Tool Call Flow

Error Handling

Tool execution errors are caught and returned as error messages:
# Custom error formatting
def format_error(e: Exception) -> str:
    return f"Tool failed: {type(e).__name__}: {str(e)}"

env = vf.MCPEnv(
    mcp_servers=[...],
    error_formatter=format_error,
)

Lifecycle Management

MCPEnv runs MCP servers in a persistent background event loop that starts during __init__ and automatically cleans up on exit.

Server Connection

  • Servers connect during environment initialization (blocking)
  • Connection failures raise immediately
  • Tools are registered once servers are ready

Cleanup

Cleanup is automatic via atexit hooks:
# Servers are disconnected when:
# 1. Python process exits
# 2. Environment is garbage collected
# 3. Manually via await env.cleanup()

env = vf.MCPEnv(mcp_servers=[...])
# ... use environment ...
await env.cleanup()  # Optional: explicit cleanup

Multi-Server Example

import verifiers as vf
from verifiers.envs.experimental.mcp_env import MCPServerConfig

def load_environment():
    servers = [
        MCPServerConfig(
            name="web",
            command="uvx",
            args=["mcp-server-fetch"],
            description="Web page fetching"
        ),
        MCPServerConfig(
            name="memory",
            command="npx",
            args=["-y", "@modelcontextprotocol/server-memory"],
            description="Knowledge graph memory"
        ),
        MCPServerConfig(
            name="postgres",
            command="npx",
            args=["-y", "@modelcontextprotocol/server-postgres"],
            env={"POSTGRES_URL": "postgresql://..."},
            description="Database queries"
        ),
    ]
    
    dataset = vf.Environment.make_dataset([
        {"question": "Research topic X and store findings in memory"},
        {"question": "Query the database for recent entries"},
    ])
    
    return vf.MCPEnv(
        mcp_servers=servers,
        dataset=dataset,
        rubric=vf.Rubric(lambda **kw: 1.0),
        max_turns=15,
    )

Tool Schema Conversion

MCP tools are automatically converted to Verifiers tool format:
# MCP tool schema
{
    "name": "read_file",
    "description": "Read file contents",
    "inputSchema": {
        "type": "object",
        "properties": {
            "path": {"type": "string"}
        },
        "required": ["path"]
    }
}

# Converted to vf.Tool
Tool(
    name="read_file",
    description="Read file contents",
    parameters={
        "type": "object",
        "properties": {"path": {"type": "string"}},
        "required": ["path"]
    }
)

Debugging

Enable detailed MCP logging:
import logging

logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger("verifiers.envs.experimental.mcp_env")
logger.setLevel(logging.DEBUG)

env = vf.MCPEnv(mcp_servers=[...])
# Logs server connections, tool registrations, and calls

Limitations

  • Global servers only: Not designed for per-rollout stateful servers
  • Stdio only: Uses stdio transport (not SSE or other protocols)
  • No streaming: Tool results are returned as complete strings
  • Single event loop: All servers share one background event loop

When to Use

Use MCPEnv when:
  • You have existing MCP servers with read-only tools
  • Tools can be shared across all rollouts
  • You need multi-server tool composition
For stateful, per-rollout tools, use StatefulToolEnv instead.

See Also

Build docs developers (and LLMs) love