Skip to main content

MultiTurnEnv

Environment for multi-turn tasks involving back-and-forth interaction between the model and environment.

Overview

MultiTurnEnv enables interactive tasks where:
  • The model generates a response
  • The environment provides feedback via env_response()
  • This continues until a stop condition is met
  • Common use cases: games, simulations, tool use, agent interactions

Inheritance

Environment
└── MultiTurnEnv
    ├── SingleTurnEnv
    └── ToolEnv
        └── StatefulToolEnv

Constructor

MultiTurnEnv(
    max_turns: int = -1,
    **kwargs
)

Parameters

max_turns
int
default:"-1"
Maximum number of turns before stopping. -1 for unlimited turns.
All other parameters are inherited from Environment.

Core Methods

env_response

async def env_response(
    messages: Messages,
    state: State,
    **kwargs
) -> Messages | str
Abstract method - Must be implemented by subclasses. Generate environment’s response to the model’s latest message.
messages
Messages
Conversation history including the model’s latest response.
state
State
Current rollout state.
Returns: Messages | str - Environment’s response as messages or string.

setup_state

async def setup_state(state: State) -> State
Override to add environment-specific state fields before the rollout begins.
state
State
Initialized state from init_state().
Returns: State - Modified state.

get_prompt_messages

async def get_prompt_messages(state: State) -> Messages
Construct the prompt for the next model turn. Override for non-linear message sequences.
state
State
Current rollout state.
Returns: Messages - Prompt messages for the model. Default behavior:
  • Turn 0: Returns state["prompt"]
  • Turn N: Concatenates previous turn’s prompt + completion + env_response()

render_completion

async def render_completion(state: State)
Render the final state["completion"] after rollout completes. Override for custom completion formatting.
state
State
Completed rollout state.
Default behavior: Extracts all messages after the initial prompt, including the final env_response if present.

add_trajectory_step

async def add_trajectory_step(
    state: State,
    trajectory_step: TrajectoryStep
)
Add a trajectory step to state["trajectory"]. Override to set intermediate rewards, advantages, or extra metadata.
state
State
Current rollout state.
trajectory_step
TrajectoryStep
Step containing prompt, completion, response, tokens, etc.

Stop Conditions

Stop conditions are methods decorated with @vf.stop that return bool. The rollout continues until any stop condition returns True.

Built-in Stop Conditions

has_error

@vf.stop(priority=100)
async def has_error(state: State, **kwargs) -> bool
Stops if state["error"] is set. Highest priority (checked first).

prompt_too_long

@vf.stop
async def prompt_too_long(state: State) -> bool
Stops if state["prompt_too_long"] is True (set when OverlongPromptError occurs).

max_turns_reached

@vf.stop
async def max_turns_reached(state: State) -> bool
Stops when trajectory length reaches max_turns (if > 0).

has_final_env_response

@vf.stop
async def has_final_env_response(state: State) -> bool
Stops if state["final_env_response"] is set. Use this to signal termination from env_response():
async def env_response(self, messages, state):
    if game_over:
        state["final_env_response"] = "Game Over!"
        return []
    return normal_response

Custom Stop Conditions

Add custom stop conditions by decorating methods with @vf.stop:
class MyEnv(vf.MultiTurnEnv):
    @vf.stop
    async def target_score_reached(self, state: vf.State) -> bool:
        return state.get("score", 0) >= 100
    
    @vf.stop(priority=50)
    async def budget_exceeded(self, state: vf.State) -> bool:
        return state["usage"]["input_tokens"] > 10000

Rollout Loop

The rollout loop is implemented in the final rollout() method:
async def rollout(
    input: RolloutInput,
    client: Client,
    model: str,
    sampling_args: SamplingArgs | None = None
) -> State
Flow:
  1. Initialize state via init_state()
  2. Call setup_state()
  3. Loop:
    • Check stop conditions via is_completed()
    • Get prompt via get_prompt_messages()
    • Get model response via get_model_response()
    • Add to trajectory via add_model_response()add_trajectory_step()
    • env_response() is called in next get_prompt_messages()
  4. Call render_completion()
  5. Return final state
Do NOT override rollout(). Use the provided hooks: setup_state(), env_response(), add_trajectory_step(), and stop conditions.

Example Usage

Simple Game Environment

import verifiers as vf

class NumberGuessingEnv(vf.MultiTurnEnv):
    def __init__(self, **kwargs):
        super().__init__(max_turns=10, **kwargs)
    
    async def setup_state(self, state: vf.State) -> vf.State:
        """Initialize game state."""
        import random
        state["target_number"] = random.randint(1, 100)
        state["guesses"] = 0
        return state
    
    async def env_response(
        self,
        messages: vf.Messages,
        state: vf.State,
        **kwargs
    ) -> vf.Messages:
        """Process guess and provide feedback."""
        last_message = str(messages[-1].content)
        state["guesses"] += 1
        
        try:
            guess = int(last_message.strip())
        except ValueError:
            return [{"role": "user", "content": "Please guess a number between 1 and 100."}]
        
        target = state["target_number"]
        
        if guess == target:
            state["final_env_response"] = f"Correct! You guessed {target} in {state['guesses']} tries."
            return []
        elif guess < target:
            return [{"role": "user", "content": "Higher!"}]
        else:
            return [{"role": "user", "content": "Lower!"}]
    
    @vf.stop
    async def game_won(self, state: vf.State) -> bool:
        return state.get("final_env_response") is not None

def load_environment():
    # Create dataset
    dataset = vf.Environment.make_dataset(
        [{"question": "Guess the number between 1 and 100."}]
    )
    
    def success_reward(state: vf.State) -> float:
        """Reward based on number of guesses (fewer is better)."""
        if state.get("final_env_response"):
            return 1.0 / state["guesses"]
        return 0.0
    
    return NumberGuessingEnv(
        dataset=dataset,
        rubric=vf.Rubric(success_reward),
        system_prompt="Guess the secret number. I'll tell you if it's higher or lower."
    )

Text-Based Adventure

import verifiers as vf

class AdventureEnv(vf.MultiTurnEnv):
    def __init__(self, **kwargs):
        super().__init__(max_turns=20, **kwargs)
        self.locations = {
            "start": {
                "description": "You are in a dark forest.",
                "exits": {"north": "cave", "south": "village"},
            },
            "cave": {
                "description": "You are in a dark cave. You found treasure!",
                "exits": {"south": "start"},
                "treasure": True,
            },
            "village": {
                "description": "You are in a peaceful village.",
                "exits": {"north": "start"},
            },
        }
    
    async def setup_state(self, state: vf.State) -> vf.State:
        state["location"] = "start"
        state["has_treasure"] = False
        return state
    
    async def env_response(
        self,
        messages: vf.Messages,
        state: vf.State,
        **kwargs
    ) -> vf.Messages:
        action = str(messages[-1].content).lower().strip()
        current_loc = self.locations[state["location"]]
        
        # Parse direction
        direction = None
        for d in ["north", "south", "east", "west"]:
            if d in action:
                direction = d
                break
        
        if direction and direction in current_loc["exits"]:
            new_loc = current_loc["exits"][direction]
            state["location"] = new_loc
            loc_data = self.locations[new_loc]
            
            if loc_data.get("treasure"):
                state["has_treasure"] = True
                state["final_env_response"] = loc_data["description"] + " You win!"
                return []
            
            return [{"role": "user", "content": loc_data["description"]}]
        else:
            return [{"role": "user", "content": "You can't go that way. " + current_loc["description"]}]
    
    @vf.stop
    async def treasure_found(self, state: vf.State) -> bool:
        return state.get("has_treasure", False)

def load_environment():
    dataset = vf.Environment.make_dataset(
        [{"question": "Find the treasure!"}]
    )
    
    def reward_fn(state: vf.State) -> float:
        return 1.0 if state.get("has_treasure") else 0.0
    
    return AdventureEnv(
        dataset=dataset,
        rubric=vf.Rubric(reward_fn),
        system_prompt="You are playing a text adventure. Choose your direction wisely."
    )

With Intermediate Rewards

import verifiers as vf

class TrainingEnv(vf.MultiTurnEnv):
    async def add_trajectory_step(
        self,
        state: vf.State,
        trajectory_step: vf.TrajectoryStep
    ):
        """Set per-step rewards for RL training."""
        # Compute intermediate reward for this step
        step_reward = self.compute_step_reward(trajectory_step, state)
        trajectory_step["reward"] = step_reward
        
        # Add to trajectory
        state["trajectory"].append(trajectory_step)
    
    def compute_step_reward(self, step: vf.TrajectoryStep, state: vf.State) -> float:
        """Compute reward for a single step."""
        # Example: penalize long responses
        completion_length = len(str(step["completion"]))
        return -0.01 * completion_length

Common Patterns

Signal Termination from env_response

Set state["final_env_response"] to stop the rollout:
async def env_response(self, messages, state):
    if is_terminal_state(state):
        state["final_env_response"] = "Terminal message"
        return []  # No more messages needed
    return [{"role": "user", "content": "Continue..."}]

Access Dataset Fields in env_response

Dataset fields are available in state["input"] or directly in state:
async def env_response(self, messages, state):
    ground_truth = state["answer"]  # From dataset
    # or: state["input"]["answer"]
    return process(ground_truth)

Stateful Simulations

Use setup_state() to initialize and env_response() to update:
async def setup_state(self, state):
    state["game_state"] = initialize_game()
    return state

async def env_response(self, messages, state):
    action = parse_action(messages[-1])
    state["game_state"] = update_game(state["game_state"], action)
    return generate_observation(state["game_state"])

Built-in Rubric

MultiTurnEnv includes MultiTurnMonitorRubric which adds:
  • num_turns metric: Number of turns in the trajectory

When to Use

Use MultiTurnEnv for:
  • Games and simulations
  • Multi-step reasoning tasks
  • Environments requiring feedback loops
  • Agent interactions
  • Tool use (or use ToolEnv for structured tool calling)
For single-response tasks, use SingleTurnEnv instead.

See Also

Build docs developers (and LLMs) love