OpenEnvEnv
Drop-in OpenEnv integration for running OpenEnv environments in Verifiers.Overview
OpenEnvEnv provides seamless integration with OpenEnv environments. It automatically manages sandbox deployment, supports both gym (step/reset) and MCP tool contracts, and uses seeds as the dataset mechanism.
Key features:
- Automatic sandbox deployment using Prime Sandboxes
- Support for both gym and MCP contracts
- Seed-based dataset generation
- Custom prompt rendering for observations
- Pre-built container image support
- Automatic retry and error handling
Installation
Install with OpenEnv support:Inheritance
Constructor
Parameters
Path to OpenEnv project directory. If None, infers from calling module’s location (looks for
proj/ directory adjacent to caller).Number of training examples to generate.
Number of evaluation examples to generate.
Starting seed for dataset generation. Each example gets
seed + index.Required. Function that converts OpenEnv observations to chat messages. Signature:
(observation, context, action_schema, contract, seed) -> Messages.Maximum turns per rollout. -1 for unlimited.
Rubric for scoring. If None, uses
OpenEnvEpisodicSumRubric() which sums step rewards.Timeout waiting for sandbox server to start.
Poll interval for health checks during startup.
Timeout for individual health check requests.
Timeout for schema fetch requests.
Maximum attempts waiting for sandbox creation.
Maximum retry attempts for transient failures.
Base delay in seconds for exponential backoff.
Exponential backoff multiplier.
Maximum backoff delay in seconds.
Jitter added to backoff delays.
Additional arguments passed to MultiTurnEnv.
Build Configuration
OpenEnvEnv requires a.build.json file in the project directory with the following fields:
Key Methods
setup_state
- Create sandbox and deploy OpenEnv server
- Fetch action schema from
/schemaendpoint - Connect client (gym or MCP)
- Reset environment with seed from
state["info"]["seed"] - For MCP: list tools and convert to Verifiers tool format
- Store server, client, and schema in state
- Render initial prompt via
prompt_renderer
env_response
_gym_env_response()for gym contract_mcp_env_response()for MCP contract
- Parse action from latest assistant message
- Call
client.step(action) - Store reward in trajectory
- Render observation via
prompt_renderer
- Extract tool calls from latest assistant message
- For each tool call, invoke via
_mcp_step_tool() - Accumulate rewards and done status
- Return tool response messages
openenv_done
state["openenv_done"] is True.
mcp_no_tool_calls
- Environment is done (
state["openenv_done"]), OR - Last message was assistant message with no tool calls
cleanup_openenv
- Close client connections
- Unexpose sandbox port
- Delete sandbox
teardown_server
Prompt Renderer
Theprompt_renderer is required and must convert OpenEnv observations to messages.
Signature:
- Must return a non-empty list of messages
- Each message must have
roleandcontentfields - Content cannot be None
Rubrics
OpenEnvEpisodicSumRubric
Example Usage
Gym Contract Environment
MCP Contract Environment
Custom Rubric
Auto-infer Project Path
Contracts
Gym Contract
Traditional reinforcement learning interface:- Actions parsed from assistant messages (JSON or single-field text)
- Environment steps with
client.step(action) - Returns observation, reward, done
- Observations rendered to user messages
MCP Contract
Tool-based interface:- Actions are tool calls
- Environment exposes tools via MCP protocol
- Model calls tools, environment returns tool responses
- Supports structured tool schemas
Action Parsing (Gym)
For gym contract, actions are parsed from the model’s response:- JSON object: Parsed directly
- Single string field: If schema has one required string field, uses raw text
- Code fence: Strips
json...wrappers
Error Handling
- Sandbox errors: Raised as
vf.SandboxErrorwith logs - Startup failures: Includes container logs and local health probe results
- Contract mismatch: Validates schema matches declared contract
- Missing renderer: Raises
ValueErrorifprompt_rendereris None - Invalid prompts: Validates rendered messages are non-empty with non-null content
Sandbox Management
OpenEnvEnv automatically manages Prime Sandboxes:- Creates sandbox from image specified in
.build.json - Exposes port and waits for health check
- Retries transient failures with exponential backoff
- Cleans up sandbox after rollout
- Provides detailed error messages with logs on failure
See Also
- OpenEnv Integration Guide - Complete setup and configuration
- MultiTurnEnv - Base class documentation
- Rubric - Reward function configuration
- State - State dictionary reference