Skip to main content
This guide will walk you through setting up a Verifiers workspace, creating your first environment, and running evaluations.

Prerequisites

Before starting, ensure you have Python 3.10 or later installed.

Setup Your Workspace

1

Install uv and the Prime CLI

First, install uv (Python package manager) and the prime CLI tool:
# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install the Prime CLI
uv tool install prime

# Log in to Prime Intellect
prime login
2

Initialize Your Workspace

Set up a new workspace for developing environments:
# Navigate to your development directory
cd ~/dev/my-lab

# Set up the workspace
prime lab setup
This command:
  • Creates a Python project (if needed)
  • Installs verifiers
  • Creates the recommended workspace structure
  • Downloads starter configuration files
Your workspace structure will look like:
configs/
├── endpoints.toml      # API endpoint configuration
├── rl/                 # Training configs
├── eval/               # Evaluation configs
└── gepa/               # Prompt optimization configs
environments/
└── AGENTS.md           # AI agent documentation
3

Add to Existing Project (Optional)

If you already have a Python project, add Verifiers without reinitializing:
uv add verifiers && prime lab setup --skip-install

Create Your First Environment

1

Initialize Environment Template

Create a new environment from the template:
prime env init my-env
This creates a new module in ./environments/my_env/ with:
environments/my_env/
├── my_env.py           # Main implementation
├── pyproject.toml      # Dependencies and metadata
└── README.md           # Documentation
2

Implement Your Environment

Edit environments/my_env/my_env.py with your environment logic:
import verifiers as vf
from datasets import Dataset

def load_environment(dataset_name: str = 'gsm8k') -> vf.Environment:
    # Load or create your dataset
    dataset = vf.load_example_dataset(dataset_name)
    
    # Define reward function
    async def correct_answer(completion, answer) -> float:
        completion_ans = completion[-1]['content']
        return 1.0 if completion_ans == answer else 0.0
    
    # Create rubric with reward functions
    rubric = vf.Rubric(funcs=[correct_answer])
    
    # Return environment instance
    env = vf.SingleTurnEnv(dataset=dataset, rubric=rubric)
    return env
The load_environment function is the entry point for your environment. It must return an Environment instance and can accept custom arguments.
3

Install Your Environment

Install the environment module into your project:
prime env install my-env
This makes your environment importable and runnable.

Run Your First Evaluation

1

Run Local Evaluation

Evaluate your environment with any OpenAI-compatible model:
prime eval run my-env -m gpt-5-nano
This will:
  • Load your environment
  • Run rollouts with the specified model
  • Calculate rewards and metrics
  • Save results locally
By default, evaluations use Prime Inference. Configure custom API endpoints in ./configs/endpoints.toml.
2

View Results

Open the terminal UI to explore your evaluation results:
prime eval tui
Navigate through:
  • Rollout samples
  • Reward distributions
  • Model completions
  • Metrics and statistics

Working with Existing Environments

1

Install from Environments Hub

Install any environment from the community hub:
prime env install primeintellect/math-python
2

Run Hub Environment

Evaluate it directly:
prime eval run primeintellect/math-python -m gpt-4.1-mini

Environment Types

Verifiers supports multiple environment patterns:

SingleTurnEnv

Simple Q&A tasks with a single model response
vf.SingleTurnEnv(dataset=dataset, rubric=rubric)

ToolEnv

Environments with stateless Python function tools
vf.ToolEnv(
    dataset=dataset,
    tools=[calculator, search],
    rubric=rubric
)

StatefulToolEnv

Tools requiring per-rollout state (sandboxes, sessions)
vf.StatefulToolEnv(
    dataset=dataset,
    tools=[file_ops],
    rubric=rubric
)

MultiTurnEnv

Custom multi-turn interactions, games, agents
class GameEnv(vf.MultiTurnEnv):
    async def env_response(self, messages, state):
        # Custom game logic
        pass

Building Complex Environments

Adding Tools

Create tool-enabled environments for agent tasks:
import verifiers as vf

def calculator(expression: str) -> str:
    """Evaluate a math expression."""
    try:
        result = eval(expression)
        return str(result)
    except Exception as e:
        return f"Error: {e}"

def load_environment():
    dataset = vf.load_example_dataset('gsm8k')
    
    async def correct_answer(completion, answer) -> float:
        response = completion[-1]['content']
        return 1.0 if answer in response else 0.0
    
    rubric = vf.Rubric(funcs=[correct_answer])
    
    return vf.ToolEnv(
        dataset=dataset,
        tools=[calculator],  # Pass Python functions
        rubric=rubric
    )

Using Sandboxes

For code execution tasks, use sandboxed environments:
import verifiers as vf

def load_environment():
    dataset = vf.load_example_dataset('codegen')
    
    async def code_passes_tests(state, info) -> float:
        # Check if code execution succeeded
        return 1.0 if state.get('tests_passed') else 0.0
    
    rubric = vf.Rubric(funcs=[code_passes_tests])
    
    return vf.PythonEnv(
        dataset=dataset,
        rubric=rubric
    )

Publishing Your Environment

1

Test Locally

Ensure your environment works correctly:
prime eval run my-env -n 10 -m gpt-4.1-mini
2

Push to Hub

Publish to the Environments Hub:
prime env push --path ./environments/my_env
Your environment is now available to the community!

Next Steps

Environments Guide

Learn about datasets, rubrics, and custom protocols

Evaluation Guide

Deep dive into evaluation configurations

Training Guide

Train models with reinforcement learning

API Reference

Explore the complete API documentation

Common Patterns

from datasets import load_dataset

def load_environment():
    dataset = load_dataset('gsm8k', 'main', split='train')
    # ... configure environment
async def accuracy(completion, answer) -> float:
    return 1.0 if answer in completion[-1]['content'] else 0.0

async def length_penalty(completion) -> float:
    length = len(completion[-1]['content'])
    return -0.01 * length  # Penalize longer responses

rubric = vf.Rubric(funcs=[accuracy, length_penalty])
return vf.SingleTurnEnv(
    dataset=dataset,
    system_prompt="You are a helpful math tutor. Show your work.",
    rubric=rubric
)
import verifiers as vf

def load_environment():
    # Validate required API keys
    vf.ensure_keys(['OPENAI_API_KEY', 'ANTHROPIC_API_KEY'])
    
    # ... rest of environment setup

Troubleshooting

Make sure you ran prime env install <env-name> and the environment has a valid load_environment function.
Configure your endpoints in ./configs/endpoints.toml. See the evaluation guide for details.
Ensure all dependencies are listed in your environment’s pyproject.toml and installed.

Build docs developers (and LLMs) love