Custom Multi-Turn Patterns

When standard environment types don’t fit your use case, MultiTurnEnv provides full control over the rollout loop. This guide covers advanced patterns for building custom environments with complex interaction logic.

When to Customize

Use custom multi-turn environments when you need:

Complex game logic — Board games, simulations, strategy games
Non-linear conversations — State-dependent message assembly
Custom feedback loops — Environment responses based on intermediate state
Specialized stop conditions — Domain-specific termination logic
Advanced state management — Complex per-rollout initialization and cleanup

The Rollout Loop

Understanding the rollout loop is essential for customization:

class MultiTurnEnv(vf.Environment):
    @final
    async def rollout(self, input, client, model, sampling_args) -> State:
        state = await self.init_state(input, client, model, sampling_args)
        
        try:
            try:
                state = await self.setup_state(state)  # 1. Initialize
            except vf.Error as e:
                state["error"] = e
            
            # 2. Main loop
            while not await self.is_completed(state):
                try:
                    prompt_messages = await self.get_prompt_messages(state)
                    if state.get("final_env_response") is not None:
                        continue
                    response = await self.get_model_response(state, prompt_messages)
                    await self.add_model_response(state, prompt_messages, response)
                except vf.Error as e:
                    state["error"] = e
            
            # 3. Finalize
            await self.render_completion(state)
            return state
        finally:
            await self._cleanup(state)  # 4. Cleanup

Never override rollout() — It’s marked @final for a reason. Override specific methods instead:

setup_state() — Per-rollout initialization
env_response() — Environment feedback after each turn
get_prompt_messages() — Custom message assembly
render_completion() — Final conversation rendering
add_trajectory_step() — Trajectory metadata

Core Methods to Override

env_response(): Required

Defines how the environment responds after each model turn:

class MyGameEnv(vf.MultiTurnEnv):
    async def env_response(
        self, 
        messages: vf.Messages, 
        state: vf.State
    ) -> vf.Messages:
        """Generate environment response after each model turn."""
        # Parse the model's action
        parsed = self.parser.parse(messages)
        action = parsed.action
        
        # Update game state
        state["board"] = apply_action(state["board"], action)
        
        # Check win condition
        if check_win(state["board"]):
            state["won"] = True
            return [{"role": "user", "content": "You won!"}]
        
        # Generate feedback
        feedback = generate_feedback(state["board"])
        return [{"role": "user", "content": feedback}]

Return value: List of new messages to append (don’t mutate existing messages).

setup_state(): Optional

Initialize per-rollout resources:

class MyGameEnv(vf.MultiTurnEnv):
    async def setup_state(self, state: vf.State) -> vf.State:
        """Initialize game state for this rollout."""
        # Initialize game board
        state["board"] = self.create_empty_board()
        state["score"] = 0
        state["moves"] = []
        
        # Setup external resources
        state["game_session"] = await self.game_api.create_session()
        
        return await super().setup_state(state)

Always call await super().setup_state(state) at the end to ensure parent class initialization runs.

get_prompt_messages(): Optional

Customize how messages are assembled for each turn:

class MyGameEnv(vf.MultiTurnEnv):
    async def get_prompt_messages(self, state: vf.State) -> vf.Messages:
        """Assemble messages with current game state."""
        if len(state["trajectory"]) == 0:
            # First turn
            return [
                {"role": "system", "content": self.system_prompt},
                {"role": "user", "content": self.format_initial_board(state["board"])}
            ]
        
        # Subsequent turns: show conversation + current state
        messages = []
        messages.append({"role": "system", "content": self.system_prompt})
        
        # Add conversation history
        for turn in state["trajectory"]:
            messages.extend(turn["completion"])
        
        # Add environment response with current board
        env_response = await self.env_response(messages, state)
        messages.extend(env_response)
        
        return messages

render_completion(): Optional

Customize how the final conversation is assembled:

class MyGameEnv(vf.MultiTurnEnv):
    async def render_completion(self, state: vf.State):
        """Assemble final completion with game summary."""
        if len(state["trajectory"]) == 0:
            state["completion"] = []
            return
        
        # Get last turn's messages
        last_prompt = state["trajectory"][-1]["prompt"]
        last_completion = state["trajectory"][-1]["completion"]
        
        # Build full conversation
        full_conversation = last_prompt + last_completion
        
        # Add final summary if game ended
        if state.get("final_env_response"):
            full_conversation.extend(state["final_env_response"])
        
        # Extract completion (everything after initial prompt)
        state["completion"] = full_conversation[len(state["prompt"]):]

add_trajectory_step(): Optional

Add metadata to each turn:

class MyGameEnv(vf.MultiTurnEnv):
    async def add_trajectory_step(
        self, 
        state: vf.State, 
        trajectory_step: TrajectoryStep
    ):
        """Enrich trajectory with game-specific metadata."""
        # Add game state snapshot
        trajectory_step["extras"]["board_state"] = state["board"].copy()
        trajectory_step["extras"]["valid_moves"] = get_valid_moves(state["board"])
        trajectory_step["extras"]["score"] = state["score"]
        
        # Set intermediate reward (optional)
        if state.get("won"):
            trajectory_step["reward"] = 1.0
        elif state.get("lost"):
            trajectory_step["reward"] = 0.0
        
        await super().add_trajectory_step(state, trajectory_step)

Stop Conditions

Define when rollouts should terminate using the @vf.stop decorator:

Basic Stop Conditions

class MyGameEnv(vf.MultiTurnEnv):
    @vf.stop
    async def game_won(self, state: vf.State) -> bool:
        return state.get("won", False)
    
    @vf.stop
    async def game_lost(self, state: vf.State) -> bool:
        return state.get("lives", 3) <= 0
    
    @vf.stop
    async def timeout(self, state: vf.State) -> bool:
        elapsed = time.time() - state["start_time"]
        return elapsed > self.max_seconds

Built-in stop conditions (always available):

has_error — stops if state["error"] is set
max_turns_reached — stops after max_turns iterations
prompt_too_long — stops if prompt exceeds model context
has_final_env_response — stops if early termination signaled

Priority-Based Execution

Control evaluation order with priorities (higher runs first):

class MyGameEnv(vf.MultiTurnEnv):
    @vf.stop(priority=100)  # Check error first
    async def fatal_error(self, state: vf.State) -> bool:
        return state.get("fatal_error") is not None
    
    @vf.stop(priority=10)  # Then check cheap conditions
    async def answer_keyword(self, state: vf.State) -> bool:
        completion = state.get("completion", [])
        if not completion:
            return False
        return "FINAL ANSWER:" in completion[-1].get("content", "")
    
    @vf.stop(priority=-10)  # Finally check expensive conditions
    async def validated_answer(self, state: vf.State) -> bool:
        return await self.validator_api.check_answer(state)

Early Termination from env_response

Signal completion directly from the environment:

class MyGameEnv(vf.MultiTurnEnv):
    async def env_response(
        self, 
        messages: vf.Messages, 
        state: vf.State
    ) -> vf.Messages:
        # Process move
        result = process_move(messages, state)
        
        # Check if game ended
        if result.game_over:
            final_message = [
                {"role": "user", "content": f"Game over! Score: {state['score']}"}
            ]
            state["final_env_response"] = final_message
            return final_message
        
        # Game continues
        return [{"role": "user", "content": result.feedback}]

Setting state["final_env_response"] triggers the has_final_env_response stop condition.

Resource Management

Cleanup: Per-Rollout

Use @vf.cleanup for per-rollout resource cleanup:

class MyGameEnv(vf.MultiTurnEnv):
    @vf.cleanup
    async def save_game_log(self, state: vf.State):
        """Save game results after each rollout."""
        try:
            await self.db.insert_game_result({
                "game_id": state["game_id"],
                "score": state.get("score", 0),
                "won": state.get("won", False),
                "moves": state.get("moves", []),
            })
        except Exception as e:
            self.logger.error(f"Failed to save game log: {e}")
    
    @vf.cleanup
    async def close_game_session(self, state: vf.State):
        """Close API session."""
        if "game_session" in state:
            try:
                await self.game_api.close_session(state["game_session"])
            except Exception as e:
                self.logger.warning(f"Failed to close session: {e}")

Teardown: Environment Shutdown

Use @vf.teardown for environment-level cleanup:

class MyGameEnv(vf.MultiTurnEnv):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.db_connection = None
    
    async def setup_state(self, state: vf.State) -> vf.State:
        # Initialize DB connection lazily
        if self.db_connection is None:
            self.db_connection = await connect_to_db()
        return await super().setup_state(state)
    
    @vf.teardown
    async def close_database(self):
        """Close database connection when environment shuts down."""
        if self.db_connection:
            await self.db_connection.close()
            self.logger.info("Database connection closed")

Idempotency is critical — Cleanup methods may be called multiple times or when resources are in unexpected states. Always:

Check if resources exist before cleaning up
Handle exceptions gracefully
Use try/except blocks
Log errors but don’t raise

Error Handling

Verifiers provides structured error handling:

Error Hierarchy

vf.Error                    # Base class
├── vf.ModelError          # Model interaction issues
│   └── vf.EmptyModelResponseError
├── vf.OverlongPromptError # Prompt exceeds context
├── vf.ToolError           # Tool-related errors
│   ├── vf.ToolParseError  # Failed to parse tool call
│   └── vf.ToolCallError   # Tool execution failed
└── vf.InfraError          # Infrastructure failures
    ├── vf.SandboxError
    └── vf.TunnelError

Raising Errors

class MyGameEnv(vf.MultiTurnEnv):
    async def env_response(
        self, 
        messages: vf.Messages, 
        state: vf.State
    ) -> vf.Messages:
        try:
            result = await self.game_api.make_move(state["game_id"], move)
            return [{"role": "user", "content": result.feedback}]
        except GameAPITimeout as e:
            # Infrastructure error - rollout will stop
            raise vf.InfraError(f"Game API timeout: {e}") from e
        except InvalidMoveError as e:
            # Invalid move - let model recover
            return [{"role": "user", "content": f"Invalid move: {e}"}]

When a vf.Error is raised:

Automatically caught by the rollout loop
Stored in state["error"]
Built-in has_error stop condition triggers
Rollout terminates gracefully

Complete Example: Tic-Tac-Toe

Here’s a complete custom environment:

import verifiers as vf
from datasets import Dataset
import numpy as np

class TicTacToeEnv(vf.MultiTurnEnv):
    def __init__(self, **kwargs):
        super().__init__(max_turns=9, **kwargs)
    
    async def setup_state(self, state: vf.State) -> vf.State:
        """Initialize empty board."""
        state["board"] = np.zeros((3, 3), dtype=int)
        state["current_player"] = 1  # 1 = X (model), -1 = O (environment)
        state["winner"] = None
        return await super().setup_state(state)
    
    async def env_response(
        self, 
        messages: vf.Messages, 
        state: vf.State
    ) -> vf.Messages:
        """Process model's move and make counter-move."""
        # Parse model's move
        last_msg = messages[-1]["content"]
        parsed = self.parser.parse(messages)
        
        try:
            row, col = int(parsed.row), int(parsed.col)
            if not (0 <= row < 3 and 0 <= col < 3):
                return [{"role": "user", "content": "Invalid position. Use 0-2 for row and col."}]
            if state["board"][row, col] != 0:
                return [{"role": "user", "content": "That position is taken. Try again."}]
        except (ValueError, AttributeError):
            return [{"role": "user", "content": "Invalid format. Use <row>0</row><col>0</col>."}]
        
        # Apply model's move
        state["board"][row, col] = 1
        
        # Check win/draw
        if self.check_winner(state["board"]) == 1:
            state["winner"] = "model"
            return [{"role": "user", "content": f"You win!\n{self.render_board(state['board'])}"}]
        
        if np.all(state["board"] != 0):
            state["winner"] = "draw"
            return [{"role": "user", "content": f"Draw!\n{self.render_board(state['board'])}"}]
        
        # Environment's move (simple strategy)
        env_row, env_col = self.make_env_move(state["board"])
        state["board"][env_row, env_col] = -1
        
        # Check if environment won
        if self.check_winner(state["board"]) == -1:
            state["winner"] = "environment"
            return [{"role": "user", "content": f"I win!\n{self.render_board(state['board'])}"}]
        
        # Game continues
        return [{"role": "user", "content": self.render_board(state["board"])}]
    
    @vf.stop
    async def game_ended(self, state: vf.State) -> bool:
        return state.get("winner") is not None
    
    def check_winner(self, board):
        """Check for winner. Returns 1 (X wins), -1 (O wins), or 0 (no winner)."""
        # Check rows, cols, diagonals
        for i in range(3):
            if abs(board[i, :].sum()) == 3:
                return board[i, 0]
            if abs(board[:, i].sum()) == 3:
                return board[0, i]
        if abs(board.diagonal().sum()) == 3:
            return board[0, 0]
        if abs(np.fliplr(board).diagonal().sum()) == 3:
            return board[0, 2]
        return 0
    
    def make_env_move(self, board):
        """Simple strategy: take center, then corners, then edges."""
        # Take center if available
        if board[1, 1] == 0:
            return 1, 1
        # Take corners
        for r, c in [(0, 0), (0, 2), (2, 0), (2, 2)]:
            if board[r, c] == 0:
                return r, c
        # Take edges
        for r, c in [(0, 1), (1, 0), (1, 2), (2, 1)]:
            if board[r, c] == 0:
                return r, c
    
    def render_board(self, board):
        """Render board as string."""
        symbols = {0: ".", 1: "X", -1: "O"}
        lines = []
        for row in board:
            lines.append(" ".join(symbols[cell] for cell in row))
        return "\n".join(lines)

# Load environment
def load_environment():
    dataset = Dataset.from_list([
        {"prompt": [{"role": "user", "content": "Let's play tic-tac-toe. You are X. Make your move using <row>0</row><col>0</col> format."}]}
        for _ in range(100)
    ])
    
    parser = vf.XMLParser(["row", "col"])
    
    async def model_won(state) -> float:
        return 1.0 if state.get("winner") == "model" else 0.0
    
    async def draw_bonus(state) -> float:
        return 0.5 if state.get("winner") == "draw" else 0.0
    
    rubric = vf.Rubric(
        funcs=[model_won, draw_bonus],
        weights=[1.0, 1.0],
        parser=parser,
    )
    
    return TicTacToeEnv(dataset=dataset, parser=parser, rubric=rubric)

Testing Custom Environments

Unit test individual methods

import pytest
import verifiers as vf

@pytest.mark.asyncio
async def test_env_response():
    env = TicTacToeEnv(dataset=dataset, rubric=rubric)
    state = {"board": np.zeros((3, 3)), "current_player": 1}
    messages = [{"role": "assistant", "content": "<row>0</row><col>0</col>"}]
    
    response = await env.env_response(messages, state)
    assert len(response) == 1
    assert state["board"][0, 0] == 1

Test with small evaluation

prime eval run tic-tac-toe -m gpt-4.1-mini -n 3 -r 2 -v

Check state tracking

prime eval run tic-tac-toe -m gpt-4.1-mini -n 5 -s \
  -C "board,winner,current_player"

Inspect saved results to verify state is tracked correctly.

Best Practices

Start simple — Build a minimal working version first, then add complexity incrementally.

Test stop conditions — Ensure rollouts don’t run forever. Add timeout conditions as a safety net.

Log liberally — Use self.logger to log state transitions, decisions, and errors during development.

Don’t mutate messages — Always return new message lists from env_response(), never modify in place.

Handle all error cases — Assume the model will send malformed responses. Validate and provide clear feedback.

Next Steps

Evaluation: Comprehensive testing strategies → Evaluation Guide
Training: Use custom environments for RL → Training Guide
Integration: Connect to external systems → Tool Environments Guide

Get Started

Core Concepts

Guides

Integrations

Custom Multi-Turn Patterns

When to Customize

The Rollout Loop

Core Methods to Override

env_response(): Required

setup_state(): Optional

get_prompt_messages(): Optional

render_completion(): Optional

add_trajectory_step(): Optional

Stop Conditions

Basic Stop Conditions

Priority-Based Execution

Early Termination from env_response

Resource Management

Cleanup: Per-Rollout

Teardown: Environment Shutdown

Error Handling

Error Hierarchy

Raising Errors

Complete Example: Tic-Tac-Toe

Testing Custom Environments

Best Practices

Next Steps

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Integrations

​When to Customize

​The Rollout Loop

​Core Methods to Override

​env_response(): Required

​setup_state(): Optional

​get_prompt_messages(): Optional

​render_completion(): Optional

​add_trajectory_step(): Optional

​Stop Conditions

​Basic Stop Conditions

​Priority-Based Execution

​Early Termination from env_response

​Resource Management

​Cleanup: Per-Rollout

​Teardown: Environment Shutdown

​Error Handling

​Error Hierarchy

​Raising Errors

​Complete Example: Tic-Tac-Toe

​Testing Custom Environments

​Best Practices

​Next Steps

Build docs developers (and LLMs) love

When to Customize

The Rollout Loop

Core Methods to Override

env_response(): Required

setup_state(): Optional

get_prompt_messages(): Optional

render_completion(): Optional

add_trajectory_step(): Optional

Stop Conditions

Basic Stop Conditions

Priority-Based Execution

Early Termination from env_response

Resource Management

Cleanup: Per-Rollout

Teardown: Environment Shutdown

Error Handling

Error Hierarchy

Raising Errors

Complete Example: Tic-Tac-Toe

Testing Custom Environments

Best Practices

Next Steps