OpenEnvEnv

Drop-in OpenEnv integration for running OpenEnv environments in Verifiers.

Overview

OpenEnvEnv provides seamless integration with OpenEnv environments. It automatically manages sandbox deployment, supports both gym (step/reset) and MCP tool contracts, and uses seeds as the dataset mechanism. Key features:

Automatic sandbox deployment using Prime Sandboxes
Support for both gym and MCP contracts
Seed-based dataset generation
Custom prompt rendering for observations
Pre-built container image support
Automatic retry and error handling

Installation

Install with OpenEnv support:

uv add 'verifiers[openenv]'
uv add prime-sandboxes

See the OpenEnv integration guide for complete setup details.

Inheritance

Environment
└── MultiTurnEnv
    └── OpenEnvEnv

Constructor

OpenEnvEnv(
    openenv_project: str | Path | None = None,
    num_train_examples: int = 100,
    num_eval_examples: int = 50,
    seed: int = 0,
    prompt_renderer: Callable[..., Messages] | None = None,
    max_turns: int = -1,
    rubric: vf.Rubric | None = None,
    startup_timeout_seconds: int = 30,
    startup_poll_interval_seconds: float = 1.0,
    health_request_timeout_seconds: float = 2.0,
    schema_request_timeout_seconds: float = 5.0,
    wait_for_creation_max_attempts: int = 20,
    max_retries: int = 5,
    base_delay: float = 0.5,
    backoff_factor: float = 2.0,
    max_backoff_seconds: float = 30.0,
    jitter: float = 1e-3,
    **kwargs
)

Parameters

openenv_project

str | Path | None

default:"None"

Path to OpenEnv project directory. If None, infers from calling module’s location (looks for proj/ directory adjacent to caller).

num_train_examples

int

default:"100"

Number of training examples to generate.

num_eval_examples

int

default:"50"

Number of evaluation examples to generate.

seed

int

default:"0"

Starting seed for dataset generation. Each example gets seed + index.

prompt_renderer

Callable[..., Messages] | None

default:"None"

Required. Function that converts OpenEnv observations to chat messages. Signature: (observation, context, action_schema, contract, seed) -> Messages.

max_turns

int

default:"-1"

Maximum turns per rollout. -1 for unlimited.

rubric

vf.Rubric | None

default:"None"

Rubric for scoring. If None, uses OpenEnvEpisodicSumRubric() which sums step rewards.

startup_timeout_seconds

int

default:"30"

Timeout waiting for sandbox server to start.

startup_poll_interval_seconds

float

default:"1.0"

Poll interval for health checks during startup.

health_request_timeout_seconds

float

default:"2.0"

Timeout for individual health check requests.

schema_request_timeout_seconds

float

default:"5.0"

Timeout for schema fetch requests.

wait_for_creation_max_attempts

int

default:"20"

Maximum attempts waiting for sandbox creation.

max_retries

int

default:"5"

Maximum retry attempts for transient failures.

base_delay

float

default:"0.5"

Base delay in seconds for exponential backoff.

backoff_factor

float

default:"2.0"

Exponential backoff multiplier.

max_backoff_seconds

float

default:"30.0"

Maximum backoff delay in seconds.

jitter

float

default:"1e-3"

Jitter added to backoff delays.

**kwargs

Any

Additional arguments passed to MultiTurnEnv.

Build Configuration

OpenEnvEnv requires a .build.json file in the project directory with the following fields:

{
  "image": "your-image-name:tag",
  "port": 8000,
  "start_command": "python server.py",
  "contract": "gym"  // or "mcp"
}

Generate this file by running:

vf-build <env-id>

Key Methods

setup_state

async def setup_state(
    state: vf.State
) -> vf.State

Initialize OpenEnv server and reset environment for this rollout. Flow:

Create sandbox and deploy OpenEnv server
Fetch action schema from /schema endpoint
Connect client (gym or MCP)
Reset environment with seed from state["info"]["seed"]
For MCP: list tools and convert to Verifiers tool format
Store server, client, and schema in state
Render initial prompt via prompt_renderer

env_response

async def env_response(
    messages: vf.Messages,
    state: vf.State,
    **kwargs
) -> vf.Messages

Process model response and step environment. Delegates to:

_gym_env_response() for gym contract
_mcp_env_response() for MCP contract

Gym flow:

Parse action from latest assistant message
Call client.step(action)
Store reward in trajectory
Render observation via prompt_renderer

MCP flow:

Extract tool calls from latest assistant message
For each tool call, invoke via _mcp_step_tool()
Accumulate rewards and done status
Return tool response messages

openenv_done

@vf.stop
async def openenv_done(
    state: vf.State
) -> bool

Stop condition for gym contract. Returns True when state["openenv_done"] is True.

mcp_no_tool_calls

@vf.stop
async def mcp_no_tool_calls(
    state: vf.State
) -> bool

Stop condition for MCP contract. Returns True when:

Environment is done (state["openenv_done"]), OR
Last message was assistant message with no tool calls

cleanup_openenv

@vf.cleanup
async def cleanup_openenv(
    state: vf.State
) -> None

Clean up OpenEnv resources after rollout:

Close client connections
Unexpose sandbox port
Delete sandbox

teardown_server

@vf.teardown
async def teardown_server() -> None

Clean up all active servers on environment teardown.

Prompt Renderer

The prompt_renderer is required and must convert OpenEnv observations to messages. Signature:

def prompt_renderer(
    observation: Any,
    context: str,           # "reset" or "step"
    action_schema: dict | None = None,
    contract: str | None = None,  # "gym" or "mcp"
    seed: int | None = None,
) -> Messages

Requirements:

Must return a non-empty list of messages
Each message must have role and content fields
Content cannot be None

Rubrics

OpenEnvEpisodicSumRubric

class OpenEnvEpisodicSumRubric(vf.Rubric)

Default rubric that sums step rewards from the trajectory:

async def sum_step_rewards(state: vf.State) -> float:
    return sum(
        float(step.get("reward", 0.0) or 0.0)
        for step in state.get("trajectory", [])
    )

Example Usage

Gym Contract Environment

import verifiers as vf
from verifiers.envs.integrations.openenv_env import OpenEnvEnv
from verifiers.types import Messages

def render_observation(
    observation: dict,
    context: str,
    action_schema: dict | None = None,
    contract: str | None = None,
    seed: int | None = None,
) -> Messages:
    """Convert observation to chat messages."""
    if context == "reset":
        return [
            {
                "role": "user",
                "content": f"Task: {observation['task']}\n\nState: {observation['state']}"
            }
        ]
    return [
        {
            "role": "user",
            "content": f"Observation: {observation['state']}"
        }
    ]

def load_environment():
    return OpenEnvEnv(
        openenv_project="./my_openenv_project/proj",
        num_train_examples=100,
        num_eval_examples=20,
        prompt_renderer=render_observation,
        max_turns=50,
        seed=0,
    )

MCP Contract Environment

import verifiers as vf
from verifiers.envs.integrations.openenv_env import OpenEnvEnv
from verifiers.types import Messages

def render_mcp_observation(
    observation: dict,
    context: str,
    action_schema: dict | None = None,
    contract: str | None = None,
    seed: int | None = None,
) -> Messages:
    """Render MCP environment observations."""
    if context == "reset":
        return [
            {
                "role": "user",
                "content": (
                    f"Goal: {observation.get('goal', 'Complete the task')}\n\n"
                    f"Use the available tools to accomplish this goal."
                )
            }
        ]
    # Step observations are returned as tool messages
    return []

def load_environment():
    return OpenEnvEnv(
        openenv_project="./mcp_project/proj",
        prompt_renderer=render_mcp_observation,
        num_train_examples=50,
        max_turns=30,
    )

Custom Rubric

import verifiers as vf
from verifiers.envs.integrations.openenv_env import OpenEnvEnv

def load_environment():
    def success_reward(state: vf.State) -> float:
        """Reward successful completion."""
        if state.get("openenv_done"):
            # Reward based on efficiency (fewer steps = better)
            num_steps = len(state["trajectory"])
            return 1.0 / num_steps
        return 0.0
    
    rubric = vf.Rubric(success_reward)
    
    return OpenEnvEnv(
        openenv_project="./my_project/proj",
        prompt_renderer=my_renderer,
        rubric=rubric,
        num_train_examples=100,
    )

Auto-infer Project Path

import verifiers as vf
from verifiers.envs.integrations.openenv_env import OpenEnvEnv

# If this file is at: /path/to/my_env.py
# OpenEnvEnv will look for: /path/to/proj/

def load_environment():
    return OpenEnvEnv(
        # openenv_project auto-inferred from caller location
        prompt_renderer=my_renderer,
        num_train_examples=100,
    )

Contracts

Gym Contract

Traditional reinforcement learning interface:

Actions parsed from assistant messages (JSON or single-field text)
Environment steps with client.step(action)
Returns observation, reward, done
Observations rendered to user messages

MCP Contract

Tool-based interface:

Actions are tool calls
Environment exposes tools via MCP protocol
Model calls tools, environment returns tool responses
Supports structured tool schemas

Action Parsing (Gym)

For gym contract, actions are parsed from the model’s response:

JSON object: Parsed directly
Single string field: If schema has one required string field, uses raw text
Code fence: Strips json... wrappers

Error Handling

Sandbox errors: Raised as vf.SandboxError with logs
Startup failures: Includes container logs and local health probe results
Contract mismatch: Validates schema matches declared contract
Missing renderer: Raises ValueError if prompt_renderer is None
Invalid prompts: Validates rendered messages are non-empty with non-null content

Sandbox Management

OpenEnvEnv automatically manages Prime Sandboxes:

Creates sandbox from image specified in .build.json
Exposes port and waits for health check
Retries transient failures with exponential backoff
Cleans up sandbox after rollout
Provides detailed error messages with logs on failure

Environment Classes

Rubrics & Parsers

Clients

Integration Classes

Experimental

Data Types

Utilities

OpenEnvEnv

OpenEnvEnv

Overview

Installation

Inheritance

Constructor

Parameters

Build Configuration

Key Methods

setup_state

env_response

openenv_done

mcp_no_tool_calls

cleanup_openenv

teardown_server

Prompt Renderer

Rubrics

OpenEnvEpisodicSumRubric

Example Usage

Gym Contract Environment

MCP Contract Environment

Custom Rubric

Auto-infer Project Path

Contracts

Gym Contract

MCP Contract

Action Parsing (Gym)

Error Handling

Sandbox Management

See Also

Build docs developers (and LLMs) love

Environment Classes

Rubrics & Parsers

Clients

Integration Classes

Experimental

Data Types

Utilities

​OpenEnvEnv

​Overview

​Installation

​Inheritance

​Constructor

​Parameters

​Build Configuration

​Key Methods

​setup_state

​env_response

​openenv_done

​mcp_no_tool_calls

​cleanup_openenv

​teardown_server

​Prompt Renderer

​Rubrics

​OpenEnvEpisodicSumRubric

​Example Usage

​Gym Contract Environment

​MCP Contract Environment

​Custom Rubric

​Auto-infer Project Path

​Contracts

​Gym Contract

​MCP Contract

​Action Parsing (Gym)

​Error Handling

​Sandbox Management

​See Also

Build docs developers (and LLMs) love

OpenEnvEnv

Overview

Installation

Inheritance

Constructor

Parameters

Build Configuration

Key Methods

setup_state

env_response

openenv_done

mcp_no_tool_calls

cleanup_openenv

teardown_server

Prompt Renderer

Rubrics

OpenEnvEpisodicSumRubric

Example Usage

Gym Contract Environment

MCP Contract Environment

Custom Rubric

Auto-infer Project Path

Contracts

Gym Contract

MCP Contract

Action Parsing (Gym)

Error Handling

Sandbox Management

See Also