Overview
Environments are the core abstraction in Verifiers that define how language models interact with tasks. Each environment orchestrates the full lifecycle of a rollout: loading data, managing model interactions, executing tools or game logic, and computing rewards.
import verifiers as vf
def load_environment():
return vf.SingleTurnEnv(
dataset=dataset,
rubric=rubric,
system_prompt="You are a helpful assistant."
)
Environment Hierarchy
All environments inherit from the abstract Environment base class and implement a rollout() method. The class hierarchy provides progressively more specialized interaction patterns:
Environment (abstract base)
├── SingleTurnEnv (single response Q&A)
└── MultiTurnEnv (multi-turn interactions)
├── ToolEnv (stateless tool calling)
│ ├── StatefulToolEnv (tools with per-rollout state)
│ │ ├── SandboxEnv (containerized bash execution)
│ │ │ └── PythonEnv (persistent Python REPL)
│ │ └── CliAgentEnv (custom agent code in sandboxes)
│ └── MCPEnv (MCP server integration)
└── Custom environments (games, simulations, etc.)
Environment Types
SingleTurnEnv
The simplest environment for single-response tasks where the model generates one completion per prompt.
import verifiers as vf
from datasets import Dataset
dataset = Dataset.from_list([
{"prompt": [{"role": "user", "content": "What is 2+2?"}], "answer": "4"},
{"prompt": [{"role": "user", "content": "What is 3*5?"}], "answer": "15"},
])
async def correct_answer(completion, answer) -> float:
response = completion[-1]["content"]
return 1.0 if answer in response else 0.0
rubric = vf.Rubric(funcs=[correct_answer])
env = vf.SingleTurnEnv(dataset=dataset, rubric=rubric)
Key characteristics:
- One model response per rollout
- No environment feedback loop
- Perfect for Q&A, classification, or completion tasks
MultiTurnEnv
Enables multi-turn interactions where the environment responds after each model turn. Subclasses must implement env_response().
class MyGameEnv(vf.MultiTurnEnv):
async def env_response(self, messages: vf.Messages, state: vf.State) -> vf.Messages:
"""Generate environment feedback after each model turn."""
parsed = self.parser.parse(messages[-1]["content"])
action = parsed.action
result = self.process_action(action, state)
return [{"role": "user", "content": result}]
Built-in stop conditions:
has_error - Stops on any vf.Error in state["error"]
prompt_too_long - Stops if prompt exceeds model context length
max_turns_reached - Stops after max_turns iterations
has_final_env_response - Stops when state["final_env_response"] is set
Constructor parameters:
MultiTurnEnv(
dataset: Dataset,
rubric: Rubric,
max_turns: int = -1, # -1 means unlimited
**kwargs
)
Adds tool calling capabilities with stateless Python functions. Tools are automatically converted to OpenAI-compatible schemas.
async def calculate(expression: str) -> str:
"""Evaluate a mathematical expression.
Args:
expression: A mathematical expression to evaluate (e.g. "2 + 2 * 3")
Returns:
The result of the evaluation.
"""
try:
result = eval(expression)
return str(result)
except Exception as e:
return f"Error: {e}"
env = vf.ToolEnv(
dataset=dataset,
tools=[calculate],
rubric=rubric,
max_turns=10
)
Tool schema extraction:
- Function name → tool name
- Type hints → parameter types
- Docstring → tool description and parameter descriptions
Stop behavior:
- Stops when model responds without tool calls (built-in
no_tools_called condition)
- Configurable error handling via
stop_errors parameter
For tools that require per-rollout state (sandbox IDs, database connections, session handles).
class MySandboxEnv(vf.StatefulToolEnv):
def __init__(self, **kwargs):
super().__init__(**kwargs)
# Register tool with hidden argument
self.add_tool(self.run_code, args_to_skip=["session_id"])
async def setup_state(self, state, **kwargs):
# Initialize per-rollout resources
state["session_id"] = await create_session()
return await super().setup_state(state, **kwargs)
def update_tool_args(self, tool_name, tool_args, messages, state, **kwargs):
# Inject state into tool calls
if tool_name == "run_code":
tool_args["session_id"] = state["session_id"]
return tool_args
async def run_code(self, code: str, session_id: str) -> str:
"""Execute code in the sandbox."""
return await execute_in_session(session_id, code)
Pattern:
- Add tools with
args_to_skip for hidden parameters
- Initialize state in
setup_state()
- Inject state values in
update_tool_args()
SandboxEnv
Provides containerized bash execution using Prime Intellect’s Sandboxes.
env = vf.SandboxEnv(
dataset=dataset,
rubric=rubric,
sandbox_name="my-sandbox",
docker_image="python:3.11-slim",
start_command="tail -f /dev/null",
cpu_cores=2,
memory_gb=4,
disk_size_gb=10,
timeout_minutes=60,
timeout_per_command_seconds=30,
environment_vars={"API_KEY": "..."},
labels=["experiment-1", "math-tasks"], # optional categorization
)
Built-in tool:
bash(command: str) - Execute shell commands in the sandbox
Lifecycle:
- Sandboxes are created in
setup_state() (per rollout)
- Destroyed in cleanup handlers after each rollout
- All setup logic should be in
start_command, not awaited until first use
PythonEnv
Extends SandboxEnv with a persistent Python REPL.
env = vf.PythonEnv(
dataset=dataset,
rubric=rubric,
packages=["numpy", "pandas"], # auto-installed in sandbox
)
Built-in tool:
python(code: str) - Execute Python code in the persistent REPL
MCPEnv
Integrates with MCP (Model Context Protocol) servers.
mcp_servers = [
{
"name": "fetch",
"command": "uvx",
"args": ["mcp-server-fetch"],
},
]
env = vf.MCPEnv(
mcp_servers=mcp_servers,
dataset=dataset,
rubric=rubric,
)
Features:
- Automatically discovers and exposes MCP server tools
- Manages server lifecycle
- Supports multiple concurrent MCP servers
Base Environment Class
Constructor Parameters
All environments accept these common parameters:
Environment(
dataset: Dataset | DatasetBuilder | None = None,
eval_dataset: Dataset | DatasetBuilder | None = None,
system_prompt: str | None = None,
few_shot: Messages | None = None,
parser: Parser | None = None,
rubric: Rubric | None = None,
sampling_args: SamplingArgs | None = None,
max_workers: int = 512,
env_id: str | None = None,
env_args: dict | None = None,
max_seq_len: int | None = None,
score_rollouts: bool = True,
pass_threshold: float = 0.5,
)
Key parameters:
dataset / eval_dataset - Training and evaluation datasets (can be DatasetBuilder for lazy loading)
system_prompt - Prepended to all prompts as a system message
few_shot - Example messages inserted after system prompt
parser - For extracting structured output (e.g., vf.XMLParser)
rubric - Reward functions and scoring logic
sampling_args - Default generation parameters (temperature, top_p, etc.)
max_seq_len - Maximum sequence length for tokenization
score_rollouts - Whether to score rollouts (disable for pure generation)
Core Methods
Generation
# Asynchronous generation
results = await env.generate(
inputs=dataset,
client=client,
model="gpt-4",
sampling_args={"temperature": 0.7},
max_concurrent=10,
save_results=True,
results_path=Path("./results")
)
# Synchronous wrapper
results = env.generate_sync(inputs=dataset, client=client, model="gpt-4")
Returns: GenerateOutputs with outputs (list of RolloutOutput) and metadata
Evaluation
# Evaluate on eval_dataset
results = await env.evaluate(
client=client,
model="gpt-4",
num_examples=100,
rollouts_per_example=4,
save_results=True
)
# Synchronous wrapper
results = env.evaluate_sync(client=client, model="gpt-4", num_examples=10)
Dataset Access
# Get datasets (triggers lazy loading if using DatasetBuilder)
train_ds = env.get_dataset(n=100, seed=42)
eval_ds = env.get_eval_dataset(n=50)
Environment Groups
EnvGroup combines multiple environments for multi-task training:
math_env = vf.SingleTurnEnv(dataset=math_data, rubric=math_rubric)
code_env = vf.ToolEnv(dataset=code_data, tools=[execute_code], rubric=code_rubric)
reasoning_env = vf.MultiTurnEnv(dataset=reasoning_data, rubric=reasoning_rubric)
combined = vf.EnvGroup(
envs=[math_env, code_env, reasoning_env],
env_names=["math", "code", "reasoning"], # optional
)
Behavior:
- Concatenates all sub-environment datasets
- Routes each rollout to the appropriate environment via
task column
- Aggregates metrics across all environments
Environment groups are particularly useful for curriculum learning and multi-task RL training where you want to train a single model across diverse task types.
Advanced Customization
Custom Stop Conditions
Define custom termination logic with the @vf.stop decorator:
class MyEnv(vf.MultiTurnEnv):
@vf.stop(priority=10) # Higher priority runs first
async def answer_submitted(self, state: vf.State) -> bool:
completion = state.get("completion", [])
if not completion:
return False
return "FINAL ANSWER:" in completion[-1].get("content", "")
Resource Management
Use lifecycle decorators for setup and cleanup:
class MyEnv(vf.MultiTurnEnv):
async def setup_state(self, state: vf.State) -> vf.State:
"""Per-rollout initialization."""
state["game_id"] = await create_game()
return await super().setup_state(state)
@vf.cleanup
async def save_game_log(self, state: vf.State):
"""Per-rollout cleanup."""
await save_log(state["game_id"])
@vf.teardown
async def close_connections(self):
"""Environment-level teardown."""
await self.db.close()
Cleanup methods must be idempotent (safe to call multiple times) and handle errors gracefully to ensure cleanup completes even when resources are in unexpected states.
Signaling Early Termination
Set state["final_env_response"] to bypass model response and end the rollout:
async def env_response(self, messages: vf.Messages, state: vf.State) -> vf.Messages:
if check_game_over(state):
final_msg = [{"role": "user", "content": f"Game over! Score: {state['score']}"}]
state["final_env_response"] = final_msg
return final_msg
# Normal response logic...
Integration Examples
TextArena Integration
Wrapper for text-based game environments:
env = vf.TextArenaEnv(
game_name="rock_paper_scissors",
num_players=2,
dataset=dataset,
rubric=rubric
)
ReasoningGym Integration
Procedural reasoning tasks:
from verifiers.envs.integrations import DatasetSpec
env = vf.ReasoningGymEnv(
dataset_spec=DatasetSpec(
name="sorting",
num_samples=100,
difficulty="hard"
),
rubric=rubric
)
Browser Automation
Browserbase integration with DOM or vision-based control:
# DOM mode (natural language browser control)
env = vf.BrowserEnv(
mode="dom",
dataset=dataset,
rubric=rubric
)
# CUA mode (coordinate-based vision control)
env = vf.BrowserEnv(
mode="cua",
use_sandbox=True, # auto-deploy CUA server in sandbox
dataset=dataset,
rubric=rubric
)