Skip to main content

PythonEnv

Sandbox-backed environment exposing a persistent Python REPL for code execution.

Overview

PythonEnv provides a stateful Python interpreter running in a sandboxed container:
  • Persistent state: Variables and imports persist across code executions
  • IPython-like behavior: Trailing expressions are automatically printed
  • Package management: Pre-install packages via pip during setup
  • Error handling: Full traceback capture for debugging
  • Execution tracking: Numbered outputs like Jupyter notebooks

Inheritance

Environment
└── MultiTurnEnv
    └── ToolEnv
        └── StatefulToolEnv
            └── SandboxEnv
                └── PythonEnv

Constructor

PythonEnv(
    pip_install_packages: str = "numpy sympy scipy",
    max_startup_wait_seconds: int = 30,
    **kwargs: Any
)

Parameters

pip_install_packages
str
default:"numpy sympy scipy"
Space-separated list of packages to install with pip during sandbox startup. Use empty string to skip installation.
max_startup_wait_seconds
int
default:"30"
Maximum time to wait for the Python worker to be ready.
All other parameters are inherited from SandboxEnv:
  • sandbox_name, docker_image, cpu_cores, memory_gb, etc.
  • timeout_per_command_seconds - Timeout for each Python execution

Tools

python

async def python(code: str) -> str
Execute Python code in the persistent REPL.
code
str
Python code to execute. Can be multiple lines.
Returns: str - Formatted output including stdout, stderr, and expression results. Output format:
  • stdout content (if any)
  • stderr content prefixed with “stderr:” (if any)
  • Exception tracebacks (on error)
  • Expression results as Out[N]: <repr> (for trailing expressions)
  • (no output) if nothing was produced

State Management

PythonEnv adds Python-specific state to the base sandbox state:
state["python_state"] = {
    "ready": bool,              # Whether Python worker is initialized
    "execution_count": int,     # Number of executions so far
    "ready_wait_time": float    # Time spent waiting for worker startup
}

Built-in Rubric

PythonEnv includes PythonMonitorRubric which tracks:
  • python_ready_wait_time: Time spent waiting for Python worker initialization
  • All metrics from SandboxEnv

Example Usage

Basic Math Environment

import verifiers as vf

def load_environment():
    dataset = vf.Environment.make_dataset(
        [
            {
                "question": "What is the sum of squares of the first 10 integers?",
                "answer": "385"
            },
            {
                "question": "Calculate factorial of 20",
                "answer": "2432902008176640000"
            },
        ]
    )
    
    def correct_answer(answer: str, completion: vf.Messages) -> float:
        """Check if the answer appears in the completion."""
        return 1.0 if answer in str(completion) else 0.0
    
    return vf.PythonEnv(
        dataset=dataset,
        rubric=vf.Rubric(correct_answer),
        system_prompt="Solve the math problem using Python. Show your work.",
        pip_install_packages="",  # No extra packages needed
        max_turns=5
    )

Scientific Computing

import verifiers as vf

def load_environment():
    dataset = vf.Environment.make_dataset(
        [
            {
                "question": "Solve the differential equation y'' - 2y' + y = 0",
                "expected": "C1*exp(x) + C2*x*exp(x)"
            },
        ]
    )
    
    def has_solution(expected: str, completion: vf.Messages) -> float:
        """Check if solution contains expected terms."""
        text = str(completion).lower()
        # Check for key terms in the solution
        has_exp = "exp" in text
        has_c1 = "c1" in text or "c_1" in text
        return 1.0 if has_exp and has_c1 else 0.0
    
    return vf.PythonEnv(
        dataset=dataset,
        rubric=vf.Rubric(has_solution),
        system_prompt="Use SymPy to solve differential equations.",
        pip_install_packages="numpy sympy scipy matplotlib",
        max_startup_wait_seconds=60,  # Allow more time for package installation
        max_turns=10
    )

Data Analysis

import verifiers as vf

def load_environment():
    dataset = vf.Environment.make_dataset(
        [
            {
                "question": "Create a numpy array of 100 random numbers and calculate mean and std",
            },
        ]
    )
    
    def uses_numpy(completion: vf.Messages) -> float:
        """Check if solution uses numpy correctly."""
        text = str(completion)
        has_import = "import numpy" in text or "from numpy" in text
        has_mean = "mean" in text.lower()
        has_std = "std" in text.lower()
        return 1.0 if has_import and has_mean and has_std else 0.0
    
    return vf.PythonEnv(
        dataset=dataset,
        rubric=vf.Rubric(uses_numpy),
        system_prompt="Use NumPy for numerical computations.",
        pip_install_packages="numpy pandas",
        max_turns=5
    )

Code Verification with Tests

import verifiers as vf

def load_environment():
    dataset = vf.Environment.make_dataset(
        [
            {
                "problem": "Write a function to check if a number is prime",
                "test_code": """
assert is_prime(2) == True
assert is_prime(4) == False
assert is_prime(17) == True
assert is_prime(1) == False
"""
            },
        ]
    )
    
    def all_tests_pass(test_code: str, completion: vf.Messages) -> float:
        """Check if all assertions passed."""
        text = str(completion)
        # Check if test execution completed without errors
        has_error = "AssertionError" in text or "Error" in text
        return 0.0 if has_error else 1.0
    
    # Custom prompt builder that includes tests
    def build_prompt(task: dict) -> str:
        return f"{task['problem']}\n\nYour solution will be tested with:\n{task['test_code']}"
    
    return vf.PythonEnv(
        dataset=dataset,
        rubric=vf.Rubric(all_tests_pass),
        system_prompt="Write the function and then run the test code.",
        prompt_builder=build_prompt,
        max_turns=10
    )

Custom Package Installation

import verifiers as vf

def load_environment():
    dataset = vf.Environment.make_dataset(
        [
            {"question": "Use requests to fetch https://api.github.com"},
        ]
    )
    
    return vf.PythonEnv(
        dataset=dataset,
        rubric=vf.Rubric(lambda c: 1.0),
        system_prompt="Use the requests library to make HTTP requests.",
        # Install multiple packages
        pip_install_packages="requests beautifulsoup4 lxml",
        max_startup_wait_seconds=90,  # More time for package downloads
        max_turns=5
    )

REPL Behavior

The Python REPL behaves like IPython:
# Code execution
code = """
x = 10
y = 20
x + y
"""
# Output: "Out[1]: 30"

# Print statements
code = """
print("Hello")
print("World")
"""
# Output: "Hello\nWorld"

# Errors with traceback
code = "1 / 0"
# Output includes full traceback ending with "ZeroDivisionError: division by zero"

# Persistent state
code1 = "import math; x = 5"
code2 = "math.sqrt(x)"  # Uses previous import and variable
# Output: "Out[2]: 2.23606797749979"

Implementation Details

Worker Process

PythonEnv runs a background Python worker process in the sandbox that:
  1. Creates named pipes (FIFOs) for bidirectional communication
  2. Maintains a persistent namespace across executions
  3. Executes code using exec() for statements and eval() for trailing expressions
  4. Captures stdout/stderr using contextlib.redirect_stdout/stderr
  5. Returns results as JSON over the response pipe

Startup Sequence

  1. Sandbox container starts with Python 3.11
  2. Packages are installed via pip (if specified)
  3. Worker script is uploaded and launched as background process
  4. Worker creates communication pipes and sets ready flag
  5. First python() call waits for ready flag before executing
  6. Subsequent calls execute immediately

Error Types

  • PythonWorkerNotReadyError: Worker failed to start within timeout
  • PythonWorkerRequestError: Communication error with worker
  • PythonWorkerDeadError: Worker process died unexpectedly
All inherit from vf.SandboxError.

When to Use

Use PythonEnv for:
  • Math and scientific computing tasks
  • Code generation with execution verification
  • Multi-step computations requiring persistent state
  • Algorithm development and testing
  • Data analysis workflows
Use SandboxEnv directly for:
  • Bash commands and system operations
  • Multi-language environments
  • File system operations
  • Custom execution environments
Use ToolEnv for:
  • Stateless Python function calls
  • Tasks not requiring code execution
  • Pre-defined operations only

See Also

Build docs developers (and LLMs) love