Skip to main content

State

The State type is a specialized dict subclass that holds all runtime information for a rollout, including inputs, outputs, trajectory, and metadata.

Overview

State provides a unified interface for accessing both:
  • Input fields: From the dataset row (prompt, answer, task, info, example_id)
  • Runtime fields: Created during rollout execution (trajectory, reward, completion, etc.)
Fields are automatically forwarded to/from the nested input dict for seamless access.

Type Definition

class State(dict):
    INPUT_FIELDS = ["prompt", "answer", "task", "info", "example_id"]
    
    # Input fields (from dataset)
    input: RolloutInput
    client: Client
    model: str
    sampling_args: SamplingArgs | None
    
    # Created during rollout
    is_completed: bool
    is_truncated: bool
    stop_condition: str | None
    tool_defs: list[Tool]
    trajectory: list[TrajectoryStep]
    completion: Messages | None
    reward: float | None
    advantage: float | None
    metrics: dict[str, float] | None
    timing: RolloutTiming | None
    error: Error | None
    usage: TokenUsage | None
    usage_tracker: object

Input Fields

These fields come from the dataset and are stored in state["input"]:
prompt
Messages
The input prompt as a list of messages.
answer
str | Any
Ground truth answer or reference data for scoring.
task
str
Task identifier (e.g., “math”, “coding”, “gsm8k”).
info
Info
Additional metadata from the dataset (arbitrary dict).
example_id
int
Unique integer ID for this example.

Runtime Fields

client
Client
The API client instance for model calls.
model
str
Model identifier (e.g., “gpt-4”, “claude-3-5-sonnet-20241022”).
sampling_args
SamplingArgs | None
Sampling parameters (temperature, top_p, etc.).
is_completed
bool
Whether the rollout completed successfully.
is_truncated
bool
Whether the rollout was truncated (max turns, length limit, etc.).
stop_condition
str | None
Name of the stop condition that ended the rollout.
tool_defs
list[Tool]
Tool definitions available during this rollout.
trajectory
list[TrajectoryStep]
Complete turn-by-turn trajectory (prompts, completions, rewards).
completion
Messages | None
Final completion (last assistant message or concatenated messages).
reward
float | None
Total reward for this rollout.
advantage
float | None
Advantage value (for group scoring).
metrics
dict[str, float] | None
Named metric scores (e.g., {"correctness": 1.0, "length": 0.8}).
timing
RolloutTiming | None
Timing information (start_time, generation_ms, scoring_ms, total_ms).
error
Error | None
Error object if rollout failed.
usage
TokenUsage | None
Token usage (input_tokens, output_tokens).

Special Behavior

Input Field Forwarding

Accessing input fields automatically looks in state["input"]:
state = State({
    "input": {
        "prompt": [{"role": "user", "content": "Hello"}],
        "answer": "42",
        "task": "qa",
        "example_id": 0
    }
})

# These are equivalent:
state["prompt"]          # Returns the prompt
state["input"]["prompt"] # Same result

# Setting also forwards:
state["answer"] = "43"
assert state["input"]["answer"] == "43"

get() Method

def get(self, key: str, default: Any = None) -> Any
Safe access with default fallback:
reward = state.get("reward", 0.0)  # Returns 0.0 if not set
task = state.get("task")            # Returns None if not set

Example Usage

Basic Access

import verifiers as vf

# In a reward function
def reward_fn(state: vf.State) -> float:
    # Access input fields
    answer = state["answer"]
    task = state["task"]
    
    # Access runtime fields
    completion = state["completion"]
    trajectory = state["trajectory"]
    
    # Check completion
    if not state["is_completed"]:
        return 0.0
    
    # Compute reward
    return 1.0 if answer in str(completion) else 0.0

Trajectory Inspection

def analyze_trajectory(state: vf.State) -> dict:
    """Extract statistics from trajectory."""
    trajectory = state["trajectory"]
    
    return {
        "num_turns": len(trajectory),
        "total_tokens": sum(
            step["tokens"]["completion_ids"].__len__()
            for step in trajectory
            if step.get("tokens")
        ),
        "tool_calls": sum(
            1 for step in trajectory
            if step["response"]["message"].get("tool_calls")
        ),
    }

Custom State Keys

Environments can add custom keys:
class CustomEnv(vf.MultiTurnEnv):
    async def setup_state(self, state: vf.State) -> vf.State:
        state = await super().setup_state(state)
        
        # Add custom fields
        state["custom_data"] = {"foo": "bar"}
        state["attempt_count"] = 0
        
        return state
    
    async def env_response(
        self,
        messages: vf.Messages,
        state: vf.State,
        **kwargs
    ) -> vf.Messages:
        # Access custom fields
        state["attempt_count"] += 1
        
        if state["attempt_count"] > 3:
            return [{"role": "user", "content": "Too many attempts!"}]
        
        return [{"role": "user", "content": "Try again"}]

Safe Error Access

def handle_errors(state: vf.State):
    error = state.get("error")
    
    if error:
        print(f"Error type: {type(error).__name__}")
        print(f"Error message: {str(error)}")
        
        # Check error type
        if isinstance(error, vf.SandboxError):
            print("Sandbox operation failed")
        elif isinstance(error, vf.InfraError):
            print("Infrastructure error")
    else:
        print("No errors")

Serialization

State can be converted to RolloutOutput for serialization:
# During environment.generate()
output: vf.RolloutOutput = serialize_state(state)

# RolloutOutput is JSON-serializable:
import json
json.dumps(output)  # Works

Type Annotations

from verifiers.types import State, Messages

def my_reward(state: State) -> float:
    # Type checker knows State has these fields
    prompt: Messages = state["prompt"]
    reward: float | None = state.get("reward")
    return reward or 0.0

Common Patterns

Checking Completion

if state["is_completed"]:
    # Rollout finished successfully
    reward = state["reward"]
else:
    # Rollout was interrupted
    error = state.get("error")

Accessing Metadata

info = state.get("info", {})
custom_field = info.get("custom_field", "default")

Iterating Trajectory

for i, step in enumerate(state["trajectory"]):
    print(f"Turn {i}:")
    print(f"  Prompt: {step['prompt']}")
    print(f"  Completion: {step['completion']}")
    print(f"  Reward: {step.get('reward', 0.0)}")

See Also

Build docs developers (and LLMs) love