PythonEnv

Sandbox-backed environment exposing a persistent Python REPL for code execution.

Overview

PythonEnv provides a stateful Python interpreter running in a sandboxed container:

Persistent state: Variables and imports persist across code executions
IPython-like behavior: Trailing expressions are automatically printed
Package management: Pre-install packages via pip during setup
Error handling: Full traceback capture for debugging
Execution tracking: Numbered outputs like Jupyter notebooks

Inheritance

Environment
└── MultiTurnEnv
    └── ToolEnv
        └── StatefulToolEnv
            └── SandboxEnv
                └── PythonEnv

Constructor

PythonEnv(
    pip_install_packages: str = "numpy sympy scipy",
    max_startup_wait_seconds: int = 30,
    **kwargs: Any
)

Parameters

pip_install_packages

str

default:"numpy sympy scipy"

Space-separated list of packages to install with pip during sandbox startup. Use empty string to skip installation.

max_startup_wait_seconds

int

default:"30"

Maximum time to wait for the Python worker to be ready.

All other parameters are inherited from SandboxEnv:

sandbox_name, docker_image, cpu_cores, memory_gb, etc.
timeout_per_command_seconds - Timeout for each Python execution

Tools

python

async def python(code: str) -> str

Execute Python code in the persistent REPL.

code

str

Python code to execute. Can be multiple lines.

Returns: str - Formatted output including stdout, stderr, and expression results. Output format:

stdout content (if any)
stderr content prefixed with “stderr:” (if any)
Exception tracebacks (on error)
Expression results as Out[N]: <repr> (for trailing expressions)
(no output) if nothing was produced

State Management

PythonEnv adds Python-specific state to the base sandbox state:

state["python_state"] = {
    "ready": bool,              # Whether Python worker is initialized
    "execution_count": int,     # Number of executions so far
    "ready_wait_time": float    # Time spent waiting for worker startup
}

Built-in Rubric

PythonEnv includes PythonMonitorRubric which tracks:

python_ready_wait_time: Time spent waiting for Python worker initialization
All metrics from SandboxEnv

Example Usage

Basic Math Environment

import verifiers as vf

def load_environment():
    dataset = vf.Environment.make_dataset(
        [
            {
                "question": "What is the sum of squares of the first 10 integers?",
                "answer": "385"
            },
            {
                "question": "Calculate factorial of 20",
                "answer": "2432902008176640000"
            },
        ]
    )
    
    def correct_answer(answer: str, completion: vf.Messages) -> float:
        """Check if the answer appears in the completion."""
        return 1.0 if answer in str(completion) else 0.0
    
    return vf.PythonEnv(
        dataset=dataset,
        rubric=vf.Rubric(correct_answer),
        system_prompt="Solve the math problem using Python. Show your work.",
        pip_install_packages="",  # No extra packages needed
        max_turns=5
    )

Scientific Computing

import verifiers as vf

def load_environment():
    dataset = vf.Environment.make_dataset(
        [
            {
                "question": "Solve the differential equation y'' - 2y' + y = 0",
                "expected": "C1*exp(x) + C2*x*exp(x)"
            },
        ]
    )
    
    def has_solution(expected: str, completion: vf.Messages) -> float:
        """Check if solution contains expected terms."""
        text = str(completion).lower()
        # Check for key terms in the solution
        has_exp = "exp" in text
        has_c1 = "c1" in text or "c_1" in text
        return 1.0 if has_exp and has_c1 else 0.0
    
    return vf.PythonEnv(
        dataset=dataset,
        rubric=vf.Rubric(has_solution),
        system_prompt="Use SymPy to solve differential equations.",
        pip_install_packages="numpy sympy scipy matplotlib",
        max_startup_wait_seconds=60,  # Allow more time for package installation
        max_turns=10
    )

Data Analysis

import verifiers as vf

def load_environment():
    dataset = vf.Environment.make_dataset(
        [
            {
                "question": "Create a numpy array of 100 random numbers and calculate mean and std",
            },
        ]
    )
    
    def uses_numpy(completion: vf.Messages) -> float:
        """Check if solution uses numpy correctly."""
        text = str(completion)
        has_import = "import numpy" in text or "from numpy" in text
        has_mean = "mean" in text.lower()
        has_std = "std" in text.lower()
        return 1.0 if has_import and has_mean and has_std else 0.0
    
    return vf.PythonEnv(
        dataset=dataset,
        rubric=vf.Rubric(uses_numpy),
        system_prompt="Use NumPy for numerical computations.",
        pip_install_packages="numpy pandas",
        max_turns=5
    )

Code Verification with Tests

import verifiers as vf

def load_environment():
    dataset = vf.Environment.make_dataset(
        [
            {
                "problem": "Write a function to check if a number is prime",
                "test_code": """
assert is_prime(2) == True
assert is_prime(4) == False
assert is_prime(17) == True
assert is_prime(1) == False
"""
            },
        ]
    )
    
    def all_tests_pass(test_code: str, completion: vf.Messages) -> float:
        """Check if all assertions passed."""
        text = str(completion)
        # Check if test execution completed without errors
        has_error = "AssertionError" in text or "Error" in text
        return 0.0 if has_error else 1.0
    
    # Custom prompt builder that includes tests
    def build_prompt(task: dict) -> str:
        return f"{task['problem']}\n\nYour solution will be tested with:\n{task['test_code']}"
    
    return vf.PythonEnv(
        dataset=dataset,
        rubric=vf.Rubric(all_tests_pass),
        system_prompt="Write the function and then run the test code.",
        prompt_builder=build_prompt,
        max_turns=10
    )

Custom Package Installation

import verifiers as vf

def load_environment():
    dataset = vf.Environment.make_dataset(
        [
            {"question": "Use requests to fetch https://api.github.com"},
        ]
    )
    
    return vf.PythonEnv(
        dataset=dataset,
        rubric=vf.Rubric(lambda c: 1.0),
        system_prompt="Use the requests library to make HTTP requests.",
        # Install multiple packages
        pip_install_packages="requests beautifulsoup4 lxml",
        max_startup_wait_seconds=90,  # More time for package downloads
        max_turns=5
    )

REPL Behavior

The Python REPL behaves like IPython:

# Code execution
code = """
x = 10
y = 20
x + y
"""
# Output: "Out[1]: 30"

# Print statements
code = """
print("Hello")
print("World")
"""
# Output: "Hello\nWorld"

# Errors with traceback
code = "1 / 0"
# Output includes full traceback ending with "ZeroDivisionError: division by zero"

# Persistent state
code1 = "import math; x = 5"
code2 = "math.sqrt(x)"  # Uses previous import and variable
# Output: "Out[2]: 2.23606797749979"

Implementation Details

Worker Process

PythonEnv runs a background Python worker process in the sandbox that:

Creates named pipes (FIFOs) for bidirectional communication
Maintains a persistent namespace across executions
Executes code using exec() for statements and eval() for trailing expressions
Captures stdout/stderr using contextlib.redirect_stdout/stderr
Returns results as JSON over the response pipe

Startup Sequence

Sandbox container starts with Python 3.11
Packages are installed via pip (if specified)
Worker script is uploaded and launched as background process
Worker creates communication pipes and sets ready flag
First python() call waits for ready flag before executing
Subsequent calls execute immediately

Error Types

PythonWorkerNotReadyError: Worker failed to start within timeout
PythonWorkerRequestError: Communication error with worker
PythonWorkerDeadError: Worker process died unexpectedly

All inherit from vf.SandboxError.

When to Use

Use PythonEnv for:

Math and scientific computing tasks
Code generation with execution verification
Multi-step computations requiring persistent state
Algorithm development and testing
Data analysis workflows

Use SandboxEnv directly for:

Bash commands and system operations
Multi-language environments
File system operations
Custom execution environments

Use ToolEnv for:

Stateless Python function calls
Tasks not requiring code execution
Pre-defined operations only

Environment Classes

Rubrics & Parsers

Clients

Integration Classes

Experimental

Data Types

Utilities

PythonEnv

PythonEnv

Overview

Inheritance

Constructor

Parameters

Tools

python

State Management

Built-in Rubric

Example Usage

Basic Math Environment

Scientific Computing

Data Analysis

Code Verification with Tests

Custom Package Installation

REPL Behavior

Implementation Details

Worker Process

Startup Sequence

Error Types

When to Use

See Also

Build docs developers (and LLMs) love

Environment Classes

Rubrics & Parsers

Clients

Integration Classes

Experimental

Data Types

Utilities

​PythonEnv

​Overview

​Inheritance

​Constructor

​Parameters

​Tools

​python

​State Management

​Built-in Rubric

​Example Usage

​Basic Math Environment

​Scientific Computing

​Data Analysis

​Code Verification with Tests

​Custom Package Installation

​REPL Behavior

​Implementation Details

​Worker Process

​Startup Sequence

​Error Types

​When to Use

​See Also

Build docs developers (and LLMs) love

PythonEnv

Overview

Inheritance

Constructor

Parameters

Tools

python

State Management

Built-in Rubric

Example Usage

Basic Math Environment

Scientific Computing

Data Analysis

Code Verification with Tests

Custom Package Installation

REPL Behavior

Implementation Details

Worker Process

Startup Sequence

Error Types

When to Use

See Also