PythonEnv
Sandbox-backed environment exposing a persistent Python REPL for code execution.
Overview
PythonEnv provides a stateful Python interpreter running in a sandboxed container:
- Persistent state: Variables and imports persist across code executions
- IPython-like behavior: Trailing expressions are automatically printed
- Package management: Pre-install packages via pip during setup
- Error handling: Full traceback capture for debugging
- Execution tracking: Numbered outputs like Jupyter notebooks
Inheritance
Environment
└── MultiTurnEnv
└── ToolEnv
└── StatefulToolEnv
└── SandboxEnv
└── PythonEnv
Constructor
PythonEnv(
pip_install_packages: str = "numpy sympy scipy",
max_startup_wait_seconds: int = 30,
**kwargs: Any
)
Parameters
pip_install_packages
str
default:"numpy sympy scipy"
Space-separated list of packages to install with pip during sandbox startup. Use empty string to skip installation.
Maximum time to wait for the Python worker to be ready.
All other parameters are inherited from SandboxEnv:
sandbox_name, docker_image, cpu_cores, memory_gb, etc.
timeout_per_command_seconds - Timeout for each Python execution
python
async def python(code: str) -> str
Execute Python code in the persistent REPL.
Python code to execute. Can be multiple lines.
Returns: str - Formatted output including stdout, stderr, and expression results.
Output format:
- stdout content (if any)
- stderr content prefixed with “stderr:” (if any)
- Exception tracebacks (on error)
- Expression results as
Out[N]: <repr> (for trailing expressions)
(no output) if nothing was produced
State Management
PythonEnv adds Python-specific state to the base sandbox state:
state["python_state"] = {
"ready": bool, # Whether Python worker is initialized
"execution_count": int, # Number of executions so far
"ready_wait_time": float # Time spent waiting for worker startup
}
Built-in Rubric
PythonEnv includes PythonMonitorRubric which tracks:
python_ready_wait_time: Time spent waiting for Python worker initialization
- All metrics from SandboxEnv
Example Usage
Basic Math Environment
import verifiers as vf
def load_environment():
dataset = vf.Environment.make_dataset(
[
{
"question": "What is the sum of squares of the first 10 integers?",
"answer": "385"
},
{
"question": "Calculate factorial of 20",
"answer": "2432902008176640000"
},
]
)
def correct_answer(answer: str, completion: vf.Messages) -> float:
"""Check if the answer appears in the completion."""
return 1.0 if answer in str(completion) else 0.0
return vf.PythonEnv(
dataset=dataset,
rubric=vf.Rubric(correct_answer),
system_prompt="Solve the math problem using Python. Show your work.",
pip_install_packages="", # No extra packages needed
max_turns=5
)
Scientific Computing
import verifiers as vf
def load_environment():
dataset = vf.Environment.make_dataset(
[
{
"question": "Solve the differential equation y'' - 2y' + y = 0",
"expected": "C1*exp(x) + C2*x*exp(x)"
},
]
)
def has_solution(expected: str, completion: vf.Messages) -> float:
"""Check if solution contains expected terms."""
text = str(completion).lower()
# Check for key terms in the solution
has_exp = "exp" in text
has_c1 = "c1" in text or "c_1" in text
return 1.0 if has_exp and has_c1 else 0.0
return vf.PythonEnv(
dataset=dataset,
rubric=vf.Rubric(has_solution),
system_prompt="Use SymPy to solve differential equations.",
pip_install_packages="numpy sympy scipy matplotlib",
max_startup_wait_seconds=60, # Allow more time for package installation
max_turns=10
)
Data Analysis
import verifiers as vf
def load_environment():
dataset = vf.Environment.make_dataset(
[
{
"question": "Create a numpy array of 100 random numbers and calculate mean and std",
},
]
)
def uses_numpy(completion: vf.Messages) -> float:
"""Check if solution uses numpy correctly."""
text = str(completion)
has_import = "import numpy" in text or "from numpy" in text
has_mean = "mean" in text.lower()
has_std = "std" in text.lower()
return 1.0 if has_import and has_mean and has_std else 0.0
return vf.PythonEnv(
dataset=dataset,
rubric=vf.Rubric(uses_numpy),
system_prompt="Use NumPy for numerical computations.",
pip_install_packages="numpy pandas",
max_turns=5
)
Code Verification with Tests
import verifiers as vf
def load_environment():
dataset = vf.Environment.make_dataset(
[
{
"problem": "Write a function to check if a number is prime",
"test_code": """
assert is_prime(2) == True
assert is_prime(4) == False
assert is_prime(17) == True
assert is_prime(1) == False
"""
},
]
)
def all_tests_pass(test_code: str, completion: vf.Messages) -> float:
"""Check if all assertions passed."""
text = str(completion)
# Check if test execution completed without errors
has_error = "AssertionError" in text or "Error" in text
return 0.0 if has_error else 1.0
# Custom prompt builder that includes tests
def build_prompt(task: dict) -> str:
return f"{task['problem']}\n\nYour solution will be tested with:\n{task['test_code']}"
return vf.PythonEnv(
dataset=dataset,
rubric=vf.Rubric(all_tests_pass),
system_prompt="Write the function and then run the test code.",
prompt_builder=build_prompt,
max_turns=10
)
Custom Package Installation
import verifiers as vf
def load_environment():
dataset = vf.Environment.make_dataset(
[
{"question": "Use requests to fetch https://api.github.com"},
]
)
return vf.PythonEnv(
dataset=dataset,
rubric=vf.Rubric(lambda c: 1.0),
system_prompt="Use the requests library to make HTTP requests.",
# Install multiple packages
pip_install_packages="requests beautifulsoup4 lxml",
max_startup_wait_seconds=90, # More time for package downloads
max_turns=5
)
REPL Behavior
The Python REPL behaves like IPython:
# Code execution
code = """
x = 10
y = 20
x + y
"""
# Output: "Out[1]: 30"
# Print statements
code = """
print("Hello")
print("World")
"""
# Output: "Hello\nWorld"
# Errors with traceback
code = "1 / 0"
# Output includes full traceback ending with "ZeroDivisionError: division by zero"
# Persistent state
code1 = "import math; x = 5"
code2 = "math.sqrt(x)" # Uses previous import and variable
# Output: "Out[2]: 2.23606797749979"
Implementation Details
Worker Process
PythonEnv runs a background Python worker process in the sandbox that:
- Creates named pipes (FIFOs) for bidirectional communication
- Maintains a persistent namespace across executions
- Executes code using
exec() for statements and eval() for trailing expressions
- Captures stdout/stderr using
contextlib.redirect_stdout/stderr
- Returns results as JSON over the response pipe
Startup Sequence
- Sandbox container starts with Python 3.11
- Packages are installed via pip (if specified)
- Worker script is uploaded and launched as background process
- Worker creates communication pipes and sets ready flag
- First
python() call waits for ready flag before executing
- Subsequent calls execute immediately
Error Types
PythonWorkerNotReadyError: Worker failed to start within timeout
PythonWorkerRequestError: Communication error with worker
PythonWorkerDeadError: Worker process died unexpectedly
All inherit from vf.SandboxError.
When to Use
Use PythonEnv for:
- Math and scientific computing tasks
- Code generation with execution verification
- Multi-step computations requiring persistent state
- Algorithm development and testing
- Data analysis workflows
Use SandboxEnv directly for:
- Bash commands and system operations
- Multi-language environments
- File system operations
- Custom execution environments
Use ToolEnv for:
- Stateless Python function calls
- Tasks not requiring code execution
- Pre-defined operations only
See Also