Skip to main content

OpenEnvEnv

Drop-in OpenEnv integration for running OpenEnv environments in Verifiers.

Overview

OpenEnvEnv provides seamless integration with OpenEnv environments. It automatically manages sandbox deployment, supports both gym (step/reset) and MCP tool contracts, and uses seeds as the dataset mechanism. Key features:
  • Automatic sandbox deployment using Prime Sandboxes
  • Support for both gym and MCP contracts
  • Seed-based dataset generation
  • Custom prompt rendering for observations
  • Pre-built container image support
  • Automatic retry and error handling

Installation

Install with OpenEnv support:
uv add 'verifiers[openenv]'
uv add prime-sandboxes
See the OpenEnv integration guide for complete setup details.

Inheritance

Environment
└── MultiTurnEnv
    └── OpenEnvEnv

Constructor

OpenEnvEnv(
    openenv_project: str | Path | None = None,
    num_train_examples: int = 100,
    num_eval_examples: int = 50,
    seed: int = 0,
    prompt_renderer: Callable[..., Messages] | None = None,
    max_turns: int = -1,
    rubric: vf.Rubric | None = None,
    startup_timeout_seconds: int = 30,
    startup_poll_interval_seconds: float = 1.0,
    health_request_timeout_seconds: float = 2.0,
    schema_request_timeout_seconds: float = 5.0,
    wait_for_creation_max_attempts: int = 20,
    max_retries: int = 5,
    base_delay: float = 0.5,
    backoff_factor: float = 2.0,
    max_backoff_seconds: float = 30.0,
    jitter: float = 1e-3,
    **kwargs
)

Parameters

openenv_project
str | Path | None
default:"None"
Path to OpenEnv project directory. If None, infers from calling module’s location (looks for proj/ directory adjacent to caller).
num_train_examples
int
default:"100"
Number of training examples to generate.
num_eval_examples
int
default:"50"
Number of evaluation examples to generate.
seed
int
default:"0"
Starting seed for dataset generation. Each example gets seed + index.
prompt_renderer
Callable[..., Messages] | None
default:"None"
Required. Function that converts OpenEnv observations to chat messages. Signature: (observation, context, action_schema, contract, seed) -> Messages.
max_turns
int
default:"-1"
Maximum turns per rollout. -1 for unlimited.
rubric
vf.Rubric | None
default:"None"
Rubric for scoring. If None, uses OpenEnvEpisodicSumRubric() which sums step rewards.
startup_timeout_seconds
int
default:"30"
Timeout waiting for sandbox server to start.
startup_poll_interval_seconds
float
default:"1.0"
Poll interval for health checks during startup.
health_request_timeout_seconds
float
default:"2.0"
Timeout for individual health check requests.
schema_request_timeout_seconds
float
default:"5.0"
Timeout for schema fetch requests.
wait_for_creation_max_attempts
int
default:"20"
Maximum attempts waiting for sandbox creation.
max_retries
int
default:"5"
Maximum retry attempts for transient failures.
base_delay
float
default:"0.5"
Base delay in seconds for exponential backoff.
backoff_factor
float
default:"2.0"
Exponential backoff multiplier.
max_backoff_seconds
float
default:"30.0"
Maximum backoff delay in seconds.
jitter
float
default:"1e-3"
Jitter added to backoff delays.
**kwargs
Any
Additional arguments passed to MultiTurnEnv.

Build Configuration

OpenEnvEnv requires a .build.json file in the project directory with the following fields:
{
  "image": "your-image-name:tag",
  "port": 8000,
  "start_command": "python server.py",
  "contract": "gym"  // or "mcp"
}
Generate this file by running:
vf-build <env-id>

Key Methods

setup_state

async def setup_state(
    state: vf.State
) -> vf.State
Initialize OpenEnv server and reset environment for this rollout. Flow:
  1. Create sandbox and deploy OpenEnv server
  2. Fetch action schema from /schema endpoint
  3. Connect client (gym or MCP)
  4. Reset environment with seed from state["info"]["seed"]
  5. For MCP: list tools and convert to Verifiers tool format
  6. Store server, client, and schema in state
  7. Render initial prompt via prompt_renderer

env_response

async def env_response(
    messages: vf.Messages,
    state: vf.State,
    **kwargs
) -> vf.Messages
Process model response and step environment. Delegates to:
  • _gym_env_response() for gym contract
  • _mcp_env_response() for MCP contract
Gym flow:
  1. Parse action from latest assistant message
  2. Call client.step(action)
  3. Store reward in trajectory
  4. Render observation via prompt_renderer
MCP flow:
  1. Extract tool calls from latest assistant message
  2. For each tool call, invoke via _mcp_step_tool()
  3. Accumulate rewards and done status
  4. Return tool response messages

openenv_done

@vf.stop
async def openenv_done(
    state: vf.State
) -> bool
Stop condition for gym contract. Returns True when state["openenv_done"] is True.

mcp_no_tool_calls

@vf.stop
async def mcp_no_tool_calls(
    state: vf.State
) -> bool
Stop condition for MCP contract. Returns True when:
  • Environment is done (state["openenv_done"]), OR
  • Last message was assistant message with no tool calls

cleanup_openenv

@vf.cleanup
async def cleanup_openenv(
    state: vf.State
) -> None
Clean up OpenEnv resources after rollout:
  • Close client connections
  • Unexpose sandbox port
  • Delete sandbox

teardown_server

@vf.teardown
async def teardown_server() -> None
Clean up all active servers on environment teardown.

Prompt Renderer

The prompt_renderer is required and must convert OpenEnv observations to messages. Signature:
def prompt_renderer(
    observation: Any,
    context: str,           # "reset" or "step"
    action_schema: dict | None = None,
    contract: str | None = None,  # "gym" or "mcp"
    seed: int | None = None,
) -> Messages
Requirements:
  • Must return a non-empty list of messages
  • Each message must have role and content fields
  • Content cannot be None

Rubrics

OpenEnvEpisodicSumRubric

class OpenEnvEpisodicSumRubric(vf.Rubric)
Default rubric that sums step rewards from the trajectory:
async def sum_step_rewards(state: vf.State) -> float:
    return sum(
        float(step.get("reward", 0.0) or 0.0)
        for step in state.get("trajectory", [])
    )

Example Usage

Gym Contract Environment

import verifiers as vf
from verifiers.envs.integrations.openenv_env import OpenEnvEnv
from verifiers.types import Messages

def render_observation(
    observation: dict,
    context: str,
    action_schema: dict | None = None,
    contract: str | None = None,
    seed: int | None = None,
) -> Messages:
    """Convert observation to chat messages."""
    if context == "reset":
        return [
            {
                "role": "user",
                "content": f"Task: {observation['task']}\n\nState: {observation['state']}"
            }
        ]
    return [
        {
            "role": "user",
            "content": f"Observation: {observation['state']}"
        }
    ]

def load_environment():
    return OpenEnvEnv(
        openenv_project="./my_openenv_project/proj",
        num_train_examples=100,
        num_eval_examples=20,
        prompt_renderer=render_observation,
        max_turns=50,
        seed=0,
    )

MCP Contract Environment

import verifiers as vf
from verifiers.envs.integrations.openenv_env import OpenEnvEnv
from verifiers.types import Messages

def render_mcp_observation(
    observation: dict,
    context: str,
    action_schema: dict | None = None,
    contract: str | None = None,
    seed: int | None = None,
) -> Messages:
    """Render MCP environment observations."""
    if context == "reset":
        return [
            {
                "role": "user",
                "content": (
                    f"Goal: {observation.get('goal', 'Complete the task')}\n\n"
                    f"Use the available tools to accomplish this goal."
                )
            }
        ]
    # Step observations are returned as tool messages
    return []

def load_environment():
    return OpenEnvEnv(
        openenv_project="./mcp_project/proj",
        prompt_renderer=render_mcp_observation,
        num_train_examples=50,
        max_turns=30,
    )

Custom Rubric

import verifiers as vf
from verifiers.envs.integrations.openenv_env import OpenEnvEnv

def load_environment():
    def success_reward(state: vf.State) -> float:
        """Reward successful completion."""
        if state.get("openenv_done"):
            # Reward based on efficiency (fewer steps = better)
            num_steps = len(state["trajectory"])
            return 1.0 / num_steps
        return 0.0
    
    rubric = vf.Rubric(success_reward)
    
    return OpenEnvEnv(
        openenv_project="./my_project/proj",
        prompt_renderer=my_renderer,
        rubric=rubric,
        num_train_examples=100,
    )

Auto-infer Project Path

import verifiers as vf
from verifiers.envs.integrations.openenv_env import OpenEnvEnv

# If this file is at: /path/to/my_env.py
# OpenEnvEnv will look for: /path/to/proj/

def load_environment():
    return OpenEnvEnv(
        # openenv_project auto-inferred from caller location
        prompt_renderer=my_renderer,
        num_train_examples=100,
    )

Contracts

Gym Contract

Traditional reinforcement learning interface:
  • Actions parsed from assistant messages (JSON or single-field text)
  • Environment steps with client.step(action)
  • Returns observation, reward, done
  • Observations rendered to user messages

MCP Contract

Tool-based interface:
  • Actions are tool calls
  • Environment exposes tools via MCP protocol
  • Model calls tools, environment returns tool responses
  • Supports structured tool schemas

Action Parsing (Gym)

For gym contract, actions are parsed from the model’s response:
  1. JSON object: Parsed directly
  2. Single string field: If schema has one required string field, uses raw text
  3. Code fence: Strips json... wrappers

Error Handling

  • Sandbox errors: Raised as vf.SandboxError with logs
  • Startup failures: Includes container logs and local health probe results
  • Contract mismatch: Validates schema matches declared contract
  • Missing renderer: Raises ValueError if prompt_renderer is None
  • Invalid prompts: Validates rendered messages are non-empty with non-null content

Sandbox Management

OpenEnvEnv automatically manages Prime Sandboxes:
  • Creates sandbox from image specified in .build.json
  • Exposes port and waits for health check
  • Retries transient failures with exponential backoff
  • Cleans up sandbox after rollout
  • Provides detailed error messages with logs on failure

See Also

Build docs developers (and LLMs) love