Skip to main content
When standard environment types don’t fit your use case, MultiTurnEnv provides full control over the rollout loop. This guide covers advanced patterns for building custom environments with complex interaction logic.

When to Customize

Use custom multi-turn environments when you need:
  • Complex game logic — Board games, simulations, strategy games
  • Non-linear conversations — State-dependent message assembly
  • Custom feedback loops — Environment responses based on intermediate state
  • Specialized stop conditions — Domain-specific termination logic
  • Advanced state management — Complex per-rollout initialization and cleanup

The Rollout Loop

Understanding the rollout loop is essential for customization:
class MultiTurnEnv(vf.Environment):
    @final
    async def rollout(self, input, client, model, sampling_args) -> State:
        state = await self.init_state(input, client, model, sampling_args)
        
        try:
            try:
                state = await self.setup_state(state)  # 1. Initialize
            except vf.Error as e:
                state["error"] = e
            
            # 2. Main loop
            while not await self.is_completed(state):
                try:
                    prompt_messages = await self.get_prompt_messages(state)
                    if state.get("final_env_response") is not None:
                        continue
                    response = await self.get_model_response(state, prompt_messages)
                    await self.add_model_response(state, prompt_messages, response)
                except vf.Error as e:
                    state["error"] = e
            
            # 3. Finalize
            await self.render_completion(state)
            return state
        finally:
            await self._cleanup(state)  # 4. Cleanup
Never override rollout() — It’s marked @final for a reason. Override specific methods instead:
  • setup_state() — Per-rollout initialization
  • env_response() — Environment feedback after each turn
  • get_prompt_messages() — Custom message assembly
  • render_completion() — Final conversation rendering
  • add_trajectory_step() — Trajectory metadata

Core Methods to Override

env_response(): Required

Defines how the environment responds after each model turn:
class MyGameEnv(vf.MultiTurnEnv):
    async def env_response(
        self, 
        messages: vf.Messages, 
        state: vf.State
    ) -> vf.Messages:
        """Generate environment response after each model turn."""
        # Parse the model's action
        parsed = self.parser.parse(messages)
        action = parsed.action
        
        # Update game state
        state["board"] = apply_action(state["board"], action)
        
        # Check win condition
        if check_win(state["board"]):
            state["won"] = True
            return [{"role": "user", "content": "You won!"}]
        
        # Generate feedback
        feedback = generate_feedback(state["board"])
        return [{"role": "user", "content": feedback}]
Return value: List of new messages to append (don’t mutate existing messages).

setup_state(): Optional

Initialize per-rollout resources:
class MyGameEnv(vf.MultiTurnEnv):
    async def setup_state(self, state: vf.State) -> vf.State:
        """Initialize game state for this rollout."""
        # Initialize game board
        state["board"] = self.create_empty_board()
        state["score"] = 0
        state["moves"] = []
        
        # Setup external resources
        state["game_session"] = await self.game_api.create_session()
        
        return await super().setup_state(state)
Always call await super().setup_state(state) at the end to ensure parent class initialization runs.

get_prompt_messages(): Optional

Customize how messages are assembled for each turn:
class MyGameEnv(vf.MultiTurnEnv):
    async def get_prompt_messages(self, state: vf.State) -> vf.Messages:
        """Assemble messages with current game state."""
        if len(state["trajectory"]) == 0:
            # First turn
            return [
                {"role": "system", "content": self.system_prompt},
                {"role": "user", "content": self.format_initial_board(state["board"])}
            ]
        
        # Subsequent turns: show conversation + current state
        messages = []
        messages.append({"role": "system", "content": self.system_prompt})
        
        # Add conversation history
        for turn in state["trajectory"]:
            messages.extend(turn["completion"])
        
        # Add environment response with current board
        env_response = await self.env_response(messages, state)
        messages.extend(env_response)
        
        return messages

render_completion(): Optional

Customize how the final conversation is assembled:
class MyGameEnv(vf.MultiTurnEnv):
    async def render_completion(self, state: vf.State):
        """Assemble final completion with game summary."""
        if len(state["trajectory"]) == 0:
            state["completion"] = []
            return
        
        # Get last turn's messages
        last_prompt = state["trajectory"][-1]["prompt"]
        last_completion = state["trajectory"][-1]["completion"]
        
        # Build full conversation
        full_conversation = last_prompt + last_completion
        
        # Add final summary if game ended
        if state.get("final_env_response"):
            full_conversation.extend(state["final_env_response"])
        
        # Extract completion (everything after initial prompt)
        state["completion"] = full_conversation[len(state["prompt"]):]

add_trajectory_step(): Optional

Add metadata to each turn:
class MyGameEnv(vf.MultiTurnEnv):
    async def add_trajectory_step(
        self, 
        state: vf.State, 
        trajectory_step: TrajectoryStep
    ):
        """Enrich trajectory with game-specific metadata."""
        # Add game state snapshot
        trajectory_step["extras"]["board_state"] = state["board"].copy()
        trajectory_step["extras"]["valid_moves"] = get_valid_moves(state["board"])
        trajectory_step["extras"]["score"] = state["score"]
        
        # Set intermediate reward (optional)
        if state.get("won"):
            trajectory_step["reward"] = 1.0
        elif state.get("lost"):
            trajectory_step["reward"] = 0.0
        
        await super().add_trajectory_step(state, trajectory_step)

Stop Conditions

Define when rollouts should terminate using the @vf.stop decorator:

Basic Stop Conditions

class MyGameEnv(vf.MultiTurnEnv):
    @vf.stop
    async def game_won(self, state: vf.State) -> bool:
        return state.get("won", False)
    
    @vf.stop
    async def game_lost(self, state: vf.State) -> bool:
        return state.get("lives", 3) <= 0
    
    @vf.stop
    async def timeout(self, state: vf.State) -> bool:
        elapsed = time.time() - state["start_time"]
        return elapsed > self.max_seconds
Built-in stop conditions (always available):
  • has_error — stops if state["error"] is set
  • max_turns_reached — stops after max_turns iterations
  • prompt_too_long — stops if prompt exceeds model context
  • has_final_env_response — stops if early termination signaled

Priority-Based Execution

Control evaluation order with priorities (higher runs first):
class MyGameEnv(vf.MultiTurnEnv):
    @vf.stop(priority=100)  # Check error first
    async def fatal_error(self, state: vf.State) -> bool:
        return state.get("fatal_error") is not None
    
    @vf.stop(priority=10)  # Then check cheap conditions
    async def answer_keyword(self, state: vf.State) -> bool:
        completion = state.get("completion", [])
        if not completion:
            return False
        return "FINAL ANSWER:" in completion[-1].get("content", "")
    
    @vf.stop(priority=-10)  # Finally check expensive conditions
    async def validated_answer(self, state: vf.State) -> bool:
        return await self.validator_api.check_answer(state)

Early Termination from env_response

Signal completion directly from the environment:
class MyGameEnv(vf.MultiTurnEnv):
    async def env_response(
        self, 
        messages: vf.Messages, 
        state: vf.State
    ) -> vf.Messages:
        # Process move
        result = process_move(messages, state)
        
        # Check if game ended
        if result.game_over:
            final_message = [
                {"role": "user", "content": f"Game over! Score: {state['score']}"}
            ]
            state["final_env_response"] = final_message
            return final_message
        
        # Game continues
        return [{"role": "user", "content": result.feedback}]
Setting state["final_env_response"] triggers the has_final_env_response stop condition.

Resource Management

Cleanup: Per-Rollout

Use @vf.cleanup for per-rollout resource cleanup:
class MyGameEnv(vf.MultiTurnEnv):
    @vf.cleanup
    async def save_game_log(self, state: vf.State):
        """Save game results after each rollout."""
        try:
            await self.db.insert_game_result({
                "game_id": state["game_id"],
                "score": state.get("score", 0),
                "won": state.get("won", False),
                "moves": state.get("moves", []),
            })
        except Exception as e:
            self.logger.error(f"Failed to save game log: {e}")
    
    @vf.cleanup
    async def close_game_session(self, state: vf.State):
        """Close API session."""
        if "game_session" in state:
            try:
                await self.game_api.close_session(state["game_session"])
            except Exception as e:
                self.logger.warning(f"Failed to close session: {e}")

Teardown: Environment Shutdown

Use @vf.teardown for environment-level cleanup:
class MyGameEnv(vf.MultiTurnEnv):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.db_connection = None
    
    async def setup_state(self, state: vf.State) -> vf.State:
        # Initialize DB connection lazily
        if self.db_connection is None:
            self.db_connection = await connect_to_db()
        return await super().setup_state(state)
    
    @vf.teardown
    async def close_database(self):
        """Close database connection when environment shuts down."""
        if self.db_connection:
            await self.db_connection.close()
            self.logger.info("Database connection closed")
Idempotency is critical — Cleanup methods may be called multiple times or when resources are in unexpected states. Always:
  • Check if resources exist before cleaning up
  • Handle exceptions gracefully
  • Use try/except blocks
  • Log errors but don’t raise

Error Handling

Verifiers provides structured error handling:

Error Hierarchy

vf.Error                    # Base class
├── vf.ModelError          # Model interaction issues
│   └── vf.EmptyModelResponseError
├── vf.OverlongPromptError # Prompt exceeds context
├── vf.ToolError           # Tool-related errors
│   ├── vf.ToolParseError  # Failed to parse tool call
│   └── vf.ToolCallError   # Tool execution failed
└── vf.InfraError          # Infrastructure failures
    ├── vf.SandboxError
    └── vf.TunnelError

Raising Errors

class MyGameEnv(vf.MultiTurnEnv):
    async def env_response(
        self, 
        messages: vf.Messages, 
        state: vf.State
    ) -> vf.Messages:
        try:
            result = await self.game_api.make_move(state["game_id"], move)
            return [{"role": "user", "content": result.feedback}]
        except GameAPITimeout as e:
            # Infrastructure error - rollout will stop
            raise vf.InfraError(f"Game API timeout: {e}") from e
        except InvalidMoveError as e:
            # Invalid move - let model recover
            return [{"role": "user", "content": f"Invalid move: {e}"}]
When a vf.Error is raised:
  1. Automatically caught by the rollout loop
  2. Stored in state["error"]
  3. Built-in has_error stop condition triggers
  4. Rollout terminates gracefully

Complete Example: Tic-Tac-Toe

Here’s a complete custom environment:
import verifiers as vf
from datasets import Dataset
import numpy as np

class TicTacToeEnv(vf.MultiTurnEnv):
    def __init__(self, **kwargs):
        super().__init__(max_turns=9, **kwargs)
    
    async def setup_state(self, state: vf.State) -> vf.State:
        """Initialize empty board."""
        state["board"] = np.zeros((3, 3), dtype=int)
        state["current_player"] = 1  # 1 = X (model), -1 = O (environment)
        state["winner"] = None
        return await super().setup_state(state)
    
    async def env_response(
        self, 
        messages: vf.Messages, 
        state: vf.State
    ) -> vf.Messages:
        """Process model's move and make counter-move."""
        # Parse model's move
        last_msg = messages[-1]["content"]
        parsed = self.parser.parse(messages)
        
        try:
            row, col = int(parsed.row), int(parsed.col)
            if not (0 <= row < 3 and 0 <= col < 3):
                return [{"role": "user", "content": "Invalid position. Use 0-2 for row and col."}]
            if state["board"][row, col] != 0:
                return [{"role": "user", "content": "That position is taken. Try again."}]
        except (ValueError, AttributeError):
            return [{"role": "user", "content": "Invalid format. Use <row>0</row><col>0</col>."}]
        
        # Apply model's move
        state["board"][row, col] = 1
        
        # Check win/draw
        if self.check_winner(state["board"]) == 1:
            state["winner"] = "model"
            return [{"role": "user", "content": f"You win!\n{self.render_board(state['board'])}"}]
        
        if np.all(state["board"] != 0):
            state["winner"] = "draw"
            return [{"role": "user", "content": f"Draw!\n{self.render_board(state['board'])}"}]
        
        # Environment's move (simple strategy)
        env_row, env_col = self.make_env_move(state["board"])
        state["board"][env_row, env_col] = -1
        
        # Check if environment won
        if self.check_winner(state["board"]) == -1:
            state["winner"] = "environment"
            return [{"role": "user", "content": f"I win!\n{self.render_board(state['board'])}"}]
        
        # Game continues
        return [{"role": "user", "content": self.render_board(state["board"])}]
    
    @vf.stop
    async def game_ended(self, state: vf.State) -> bool:
        return state.get("winner") is not None
    
    def check_winner(self, board):
        """Check for winner. Returns 1 (X wins), -1 (O wins), or 0 (no winner)."""
        # Check rows, cols, diagonals
        for i in range(3):
            if abs(board[i, :].sum()) == 3:
                return board[i, 0]
            if abs(board[:, i].sum()) == 3:
                return board[0, i]
        if abs(board.diagonal().sum()) == 3:
            return board[0, 0]
        if abs(np.fliplr(board).diagonal().sum()) == 3:
            return board[0, 2]
        return 0
    
    def make_env_move(self, board):
        """Simple strategy: take center, then corners, then edges."""
        # Take center if available
        if board[1, 1] == 0:
            return 1, 1
        # Take corners
        for r, c in [(0, 0), (0, 2), (2, 0), (2, 2)]:
            if board[r, c] == 0:
                return r, c
        # Take edges
        for r, c in [(0, 1), (1, 0), (1, 2), (2, 1)]:
            if board[r, c] == 0:
                return r, c
    
    def render_board(self, board):
        """Render board as string."""
        symbols = {0: ".", 1: "X", -1: "O"}
        lines = []
        for row in board:
            lines.append(" ".join(symbols[cell] for cell in row))
        return "\n".join(lines)

# Load environment
def load_environment():
    dataset = Dataset.from_list([
        {"prompt": [{"role": "user", "content": "Let's play tic-tac-toe. You are X. Make your move using <row>0</row><col>0</col> format."}]}
        for _ in range(100)
    ])
    
    parser = vf.XMLParser(["row", "col"])
    
    async def model_won(state) -> float:
        return 1.0 if state.get("winner") == "model" else 0.0
    
    async def draw_bonus(state) -> float:
        return 0.5 if state.get("winner") == "draw" else 0.0
    
    rubric = vf.Rubric(
        funcs=[model_won, draw_bonus],
        weights=[1.0, 1.0],
        parser=parser,
    )
    
    return TicTacToeEnv(dataset=dataset, parser=parser, rubric=rubric)

Testing Custom Environments

1
Unit test individual methods
2
import pytest
import verifiers as vf

@pytest.mark.asyncio
async def test_env_response():
    env = TicTacToeEnv(dataset=dataset, rubric=rubric)
    state = {"board": np.zeros((3, 3)), "current_player": 1}
    messages = [{"role": "assistant", "content": "<row>0</row><col>0</col>"}]
    
    response = await env.env_response(messages, state)
    assert len(response) == 1
    assert state["board"][0, 0] == 1
3
Test with small evaluation
4
prime eval run tic-tac-toe -m gpt-4.1-mini -n 3 -r 2 -v
5
Check state tracking
6
prime eval run tic-tac-toe -m gpt-4.1-mini -n 5 -s \
  -C "board,winner,current_player"
7
Inspect saved results to verify state is tracked correctly.

Best Practices

Start simple — Build a minimal working version first, then add complexity incrementally.
Test stop conditions — Ensure rollouts don’t run forever. Add timeout conditions as a safety net.
Log liberally — Use self.logger to log state transitions, decisions, and errors during development.
Don’t mutate messages — Always return new message lists from env_response(), never modify in place.
Handle all error cases — Assume the model will send malformed responses. Validate and provide clear feedback.

Next Steps

Build docs developers (and LLMs) love