MultiTurnEnv
Environment for multi-turn tasks involving back-and-forth interaction between the model and environment.
Overview
MultiTurnEnv enables interactive tasks where:
- The model generates a response
- The environment provides feedback via
env_response()
- This continues until a stop condition is met
- Common use cases: games, simulations, tool use, agent interactions
Inheritance
Environment
└── MultiTurnEnv
├── SingleTurnEnv
└── ToolEnv
└── StatefulToolEnv
Constructor
MultiTurnEnv(
max_turns: int = -1,
**kwargs
)
Parameters
Maximum number of turns before stopping. -1 for unlimited turns.
All other parameters are inherited from Environment.
Core Methods
env_response
async def env_response(
messages: Messages,
state: State,
**kwargs
) -> Messages | str
Abstract method - Must be implemented by subclasses.
Generate environment’s response to the model’s latest message.
Conversation history including the model’s latest response.
Returns: Messages | str - Environment’s response as messages or string.
setup_state
async def setup_state(state: State) -> State
Override to add environment-specific state fields before the rollout begins.
Initialized state from init_state().
Returns: State - Modified state.
get_prompt_messages
async def get_prompt_messages(state: State) -> Messages
Construct the prompt for the next model turn. Override for non-linear message sequences.
Returns: Messages - Prompt messages for the model.
Default behavior:
- Turn 0: Returns
state["prompt"]
- Turn N: Concatenates previous turn’s prompt + completion +
env_response()
render_completion
async def render_completion(state: State)
Render the final state["completion"] after rollout completes. Override for custom completion formatting.
Default behavior: Extracts all messages after the initial prompt, including the final env_response if present.
add_trajectory_step
async def add_trajectory_step(
state: State,
trajectory_step: TrajectoryStep
)
Add a trajectory step to state["trajectory"]. Override to set intermediate rewards, advantages, or extra metadata.
Step containing prompt, completion, response, tokens, etc.
Stop Conditions
Stop conditions are methods decorated with @vf.stop that return bool. The rollout continues until any stop condition returns True.
Built-in Stop Conditions
has_error
@vf.stop(priority=100)
async def has_error(state: State, **kwargs) -> bool
Stops if state["error"] is set. Highest priority (checked first).
prompt_too_long
@vf.stop
async def prompt_too_long(state: State) -> bool
Stops if state["prompt_too_long"] is True (set when OverlongPromptError occurs).
max_turns_reached
@vf.stop
async def max_turns_reached(state: State) -> bool
Stops when trajectory length reaches max_turns (if > 0).
has_final_env_response
@vf.stop
async def has_final_env_response(state: State) -> bool
Stops if state["final_env_response"] is set. Use this to signal termination from env_response():
async def env_response(self, messages, state):
if game_over:
state["final_env_response"] = "Game Over!"
return []
return normal_response
Custom Stop Conditions
Add custom stop conditions by decorating methods with @vf.stop:
class MyEnv(vf.MultiTurnEnv):
@vf.stop
async def target_score_reached(self, state: vf.State) -> bool:
return state.get("score", 0) >= 100
@vf.stop(priority=50)
async def budget_exceeded(self, state: vf.State) -> bool:
return state["usage"]["input_tokens"] > 10000
Rollout Loop
The rollout loop is implemented in the final rollout() method:
async def rollout(
input: RolloutInput,
client: Client,
model: str,
sampling_args: SamplingArgs | None = None
) -> State
Flow:
- Initialize state via
init_state()
- Call
setup_state()
- Loop:
- Check stop conditions via
is_completed()
- Get prompt via
get_prompt_messages()
- Get model response via
get_model_response()
- Add to trajectory via
add_model_response() → add_trajectory_step()
env_response() is called in next get_prompt_messages()
- Call
render_completion()
- Return final state
Do NOT override rollout(). Use the provided hooks: setup_state(), env_response(), add_trajectory_step(), and stop conditions.
Example Usage
Simple Game Environment
import verifiers as vf
class NumberGuessingEnv(vf.MultiTurnEnv):
def __init__(self, **kwargs):
super().__init__(max_turns=10, **kwargs)
async def setup_state(self, state: vf.State) -> vf.State:
"""Initialize game state."""
import random
state["target_number"] = random.randint(1, 100)
state["guesses"] = 0
return state
async def env_response(
self,
messages: vf.Messages,
state: vf.State,
**kwargs
) -> vf.Messages:
"""Process guess and provide feedback."""
last_message = str(messages[-1].content)
state["guesses"] += 1
try:
guess = int(last_message.strip())
except ValueError:
return [{"role": "user", "content": "Please guess a number between 1 and 100."}]
target = state["target_number"]
if guess == target:
state["final_env_response"] = f"Correct! You guessed {target} in {state['guesses']} tries."
return []
elif guess < target:
return [{"role": "user", "content": "Higher!"}]
else:
return [{"role": "user", "content": "Lower!"}]
@vf.stop
async def game_won(self, state: vf.State) -> bool:
return state.get("final_env_response") is not None
def load_environment():
# Create dataset
dataset = vf.Environment.make_dataset(
[{"question": "Guess the number between 1 and 100."}]
)
def success_reward(state: vf.State) -> float:
"""Reward based on number of guesses (fewer is better)."""
if state.get("final_env_response"):
return 1.0 / state["guesses"]
return 0.0
return NumberGuessingEnv(
dataset=dataset,
rubric=vf.Rubric(success_reward),
system_prompt="Guess the secret number. I'll tell you if it's higher or lower."
)
Text-Based Adventure
import verifiers as vf
class AdventureEnv(vf.MultiTurnEnv):
def __init__(self, **kwargs):
super().__init__(max_turns=20, **kwargs)
self.locations = {
"start": {
"description": "You are in a dark forest.",
"exits": {"north": "cave", "south": "village"},
},
"cave": {
"description": "You are in a dark cave. You found treasure!",
"exits": {"south": "start"},
"treasure": True,
},
"village": {
"description": "You are in a peaceful village.",
"exits": {"north": "start"},
},
}
async def setup_state(self, state: vf.State) -> vf.State:
state["location"] = "start"
state["has_treasure"] = False
return state
async def env_response(
self,
messages: vf.Messages,
state: vf.State,
**kwargs
) -> vf.Messages:
action = str(messages[-1].content).lower().strip()
current_loc = self.locations[state["location"]]
# Parse direction
direction = None
for d in ["north", "south", "east", "west"]:
if d in action:
direction = d
break
if direction and direction in current_loc["exits"]:
new_loc = current_loc["exits"][direction]
state["location"] = new_loc
loc_data = self.locations[new_loc]
if loc_data.get("treasure"):
state["has_treasure"] = True
state["final_env_response"] = loc_data["description"] + " You win!"
return []
return [{"role": "user", "content": loc_data["description"]}]
else:
return [{"role": "user", "content": "You can't go that way. " + current_loc["description"]}]
@vf.stop
async def treasure_found(self, state: vf.State) -> bool:
return state.get("has_treasure", False)
def load_environment():
dataset = vf.Environment.make_dataset(
[{"question": "Find the treasure!"}]
)
def reward_fn(state: vf.State) -> float:
return 1.0 if state.get("has_treasure") else 0.0
return AdventureEnv(
dataset=dataset,
rubric=vf.Rubric(reward_fn),
system_prompt="You are playing a text adventure. Choose your direction wisely."
)
import verifiers as vf
class TrainingEnv(vf.MultiTurnEnv):
async def add_trajectory_step(
self,
state: vf.State,
trajectory_step: vf.TrajectoryStep
):
"""Set per-step rewards for RL training."""
# Compute intermediate reward for this step
step_reward = self.compute_step_reward(trajectory_step, state)
trajectory_step["reward"] = step_reward
# Add to trajectory
state["trajectory"].append(trajectory_step)
def compute_step_reward(self, step: vf.TrajectoryStep, state: vf.State) -> float:
"""Compute reward for a single step."""
# Example: penalize long responses
completion_length = len(str(step["completion"]))
return -0.01 * completion_length
Common Patterns
Signal Termination from env_response
Set state["final_env_response"] to stop the rollout:
async def env_response(self, messages, state):
if is_terminal_state(state):
state["final_env_response"] = "Terminal message"
return [] # No more messages needed
return [{"role": "user", "content": "Continue..."}]
Access Dataset Fields in env_response
Dataset fields are available in state["input"] or directly in state:
async def env_response(self, messages, state):
ground_truth = state["answer"] # From dataset
# or: state["input"]["answer"]
return process(ground_truth)
Stateful Simulations
Use setup_state() to initialize and env_response() to update:
async def setup_state(self, state):
state["game_state"] = initialize_game()
return state
async def env_response(self, messages, state):
action = parse_action(messages[-1])
state["game_state"] = update_game(state["game_state"], action)
return generate_observation(state["game_state"])
Built-in Rubric
MultiTurnEnv includes MultiTurnMonitorRubric which adds:
num_turns metric: Number of turns in the trajectory
When to Use
Use MultiTurnEnv for:
- Games and simulations
- Multi-step reasoning tasks
- Environments requiring feedback loops
- Agent interactions
- Tool use (or use ToolEnv for structured tool calling)
For single-response tasks, use SingleTurnEnv instead.
See Also