Skip to main content
The MCPEnv integration allows you to connect to MCP (Model Context Protocol) servers and expose their tools to language models in Verifiers environments. MCP provides a standardized way to connect AI models to external data sources and tools via a simple protocol.

Features

  • Multiple MCP servers - Connect to multiple servers simultaneously
  • Automatic tool discovery - Tools from servers are automatically exposed to models
  • stdio transport - Communicates via standard input/output
  • Type-safe - Preserves tool schemas and parameter types
  • Built on ToolEnv - Inherits all ToolEnv features

Installation

MCP support is included in core Verifiers:
uv add verifiers
The MCP SDK is automatically installed as a dependency.

Quick Start

1

Create an environment

Create a basic MCP environment:
import os
import verifiers as vf
from verifiers.envs.experimental.mcp_env import MCPEnv
from datasets import Dataset

def load_environment():
    # Configure MCP servers
    mcp_servers = [
        {
            "name": "fetch",
            "command": "uvx",
            "args": ["mcp-server-fetch"],
            "description": "Fetch web content"
        },
    ]

    # Create dataset
    dataset = Dataset.from_dict({
        "question": [
            "What is the latest news on OpenAI's website?",
        ],
        "answer": ["Recent updates about GPT models"]
    })

    # Create rubric
    rubric = vf.JudgeRubric(judge_model="gpt-4.1-mini")
    
    async def judge_reward(judge, prompt, completion, answer):
        response = await judge(prompt, completion, answer)
        return 1.0 if "yes" in response.lower() else 0.0
    
    rubric.add_reward_func(judge_reward)

    # Create environment
    return MCPEnv(
        mcp_servers=mcp_servers,
        dataset=dataset,
        rubric=rubric,
        max_turns=10,
    )
2

Evaluate

Run an evaluation:
prime eval run my-mcp-env -m openai/gpt-4.1-mini -n 5

MCP Server Configuration

Configure MCP servers using the MCPServerConfig format:
mcp_servers = [
    {
        "name": "fetch",
        "command": "uvx",
        "args": ["mcp-server-fetch"],
        "description": "Fetch web content",
    },
    {
        "name": "filesystem",
        "command": "npx",
        "args": ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"],
        "description": "File system access",
    },
]
Configuration fields:
  • name - Identifier for the server
  • command - Command to launch the server
  • args - List of command arguments
  • env - Environment variables (optional)
  • description - Human-readable description (optional)

With Environment Variables

For servers requiring API keys:
import os
import verifiers as vf

def load_environment():
    vf.ensure_keys(["EXA_API_KEY"])  # Validate key exists
    
    mcp_servers = [
        {
            "name": "exa",
            "command": "npx",
            "args": ["-y", "exa-mcp-server"],
            "env": {"EXA_API_KEY": os.environ["EXA_API_KEY"]},
            "description": "Exa search",
        },
    ]
    
    return MCPEnv(
        mcp_servers=mcp_servers,
        dataset=dataset,
        rubric=rubric,
    )

Available MCP Servers

Common MCP servers you can use: Fetch - Retrieve web content
{
    "name": "fetch",
    "command": "uvx",
    "args": ["mcp-server-fetch"],
}
Exa - AI-powered search
{
    "name": "exa",
    "command": "npx",
    "args": ["-y", "exa-mcp-server"],
    "env": {"EXA_API_KEY": os.environ["EXA_API_KEY"]},
}
Brave Search - Web search
{
    "name": "brave",
    "command": "npx",
    "args": ["-y", "@modelcontextprotocol/server-brave-search"],
    "env": {"BRAVE_API_KEY": os.environ["BRAVE_API_KEY"]},
}

File System

Filesystem - Read/write files
{
    "name": "filesystem",
    "command": "npx",
    "args": ["-y", "@modelcontextprotocol/server-filesystem", "/path/to/dir"],
}

Databases

PostgreSQL - Query databases
{
    "name": "postgres",
    "command": "npx",
    "args": ["-y", "@modelcontextprotocol/server-postgres"],
    "env": {"POSTGRES_URL": os.environ["POSTGRES_URL"]},
}
SQLite - Local database access
{
    "name": "sqlite",
    "command": "npx",
    "args": ["-y", "@modelcontextprotocol/server-sqlite", "database.db"],
}

Development Tools

Git - Repository operations
{
    "name": "git",
    "command": "npx",
    "args": ["-y", "@modelcontextprotocol/server-git"],
}
GitHub - GitHub API access
{
    "name": "github",
    "command": "npx",
    "args": ["-y", "@modelcontextprotocol/server-github"],
    "env": {"GITHUB_TOKEN": os.environ["GITHUB_TOKEN"]},
}
See MCP servers directory for more servers.

Full Example

Here’s a complete example using multiple MCP servers:
import os
from datasets import Dataset
import verifiers as vf
from verifiers.envs.experimental.mcp_env import MCPEnv

def load_environment(
    mcp_servers: list | None = None,
    dataset=None,
    **kwargs
) -> vf.Environment:
    # Validate API keys
    vf.ensure_keys(["EXA_API_KEY"])
    
    # Configure MCP servers
    if mcp_servers is None:
        mcp_servers = [
            {
                "name": "exa",
                "command": "npx",
                "args": ["-y", "exa-mcp-server"],
                "env": {"EXA_API_KEY": os.environ["EXA_API_KEY"]},
                "description": "Exa AI search",
            },
            {
                "name": "fetch",
                "command": "uvx",
                "args": ["mcp-server-fetch"],
                "description": "Fetch web content",
            },
        ]
    
    # Create dataset
    if dataset is None:
        dataset = Dataset.from_dict({
            "question": [
                "Find the latest Prime Intellect announcement",
                "What is the current weather in San Francisco?",
            ],
            "answer": [
                "Information about recent announcements",
                "Current weather conditions",
            ]
        })
    
    # Create rubric with judge
    rubric = vf.JudgeRubric(judge_model="gpt-4.1-mini")
    
    async def judge_reward(judge, prompt, completion, answer, state):
        verdict = await judge(prompt, completion, answer, state)
        return 1.0 if "yes" in verdict.lower() else 0.0
    
    rubric.add_reward_func(judge_reward, weight=1.0)
    
    # Create MCP environment
    return MCPEnv(
        mcp_servers=mcp_servers,
        dataset=dataset,
        rubric=rubric,
        max_turns=10,
        **kwargs,
    )

Error Handling

Configure error handling behavior:
def custom_error_formatter(error: Exception) -> str:
    """Format errors for the model."""
    return f"Tool error: {str(error)[:100]}"

env = MCPEnv(
    mcp_servers=mcp_servers,
    dataset=dataset,
    rubric=rubric,
    error_formatter=custom_error_formatter,
)

Architecture Notes

MCPEnv is designed for globally available, read-only MCP servers where the same toolset can be shared across all rollouts. For servers requiring per-rollout state or mutable task-specific data, consider implementing a custom StatefulToolEnv subclass.

Connection Management

MCP servers are connected once during environment initialization and shared across all rollouts:
  1. Environment starts background event loop
  2. Connects to all configured MCP servers
  3. Discovers available tools via tools/list
  4. Exposes tools to rollouts
  5. Cleanup on environment shutdown

Tool Execution

When a model calls an MCP tool:
  1. Tool call is intercepted by MCPEnv
  2. Request is sent to appropriate MCP server
  3. Response is returned as tool message
  4. Errors are formatted via error_formatter

Best Practices

  • Validate API keys - Use vf.ensure_keys() to fail fast if keys are missing
  • Document requirements - List required environment variables in README
  • Test servers locally - Verify MCP servers work before using in environments
  • Handle errors gracefully - Provide clear error messages via error_formatter
  • Limit tool calls - Set reasonable max_turns to prevent infinite loops

Limitations

  • MCP servers must support stdio transport
  • Servers are started once per environment, not per rollout
  • No support for resources or prompts (tools only)
  • Limited to read-only operations (no per-rollout state)

Examples

See the mcp-search-env example in the Verifiers repository for a complete implementation.

Further Reading

Build docs developers (and LLMs) love