Skip to main content

SingleTurnEnv

Environment for single-turn tasks where the model generates a single response to a prompt.

Overview

SingleTurnEnv is the simplest environment type, designed for Q&A tasks where:
  • The model receives a prompt and generates a single response
  • No multi-turn interaction is needed
  • Scoring is based on the single response
This class inherits from MultiTurnEnv with max_turns=1 and disables env_response().

Inheritance

Environment
└── MultiTurnEnv
    └── SingleTurnEnv

Constructor

SingleTurnEnv(**kwargs)
Accepts all parameters from Environment constructor. The max_turns parameter is automatically set to 1.

Key Differences from MultiTurnEnv

  • max_turns is fixed at 1 (cannot be overridden)
  • env_response() raises NotImplementedError if called
  • Rollout completes after a single model response

Methods

env_response

async def env_response(
    messages: vf.Messages,
    state: vf.State,
    **kwargs
) -> vf.Messages
This method raises NotImplementedError for SingleTurnEnv. It is not used in single-turn scenarios.

render_completion

async def render_completion(state: vf.State)
Renders the final completion from the trajectory. Inherited from MultiTurnEnv.

Example Usage

import verifiers as vf
from datasets import load_dataset

# Question-answering environment
class QAEnv(vf.SingleTurnEnv):
    pass  # No custom logic needed for basic Q&A

def load_environment():
    # Load GSM8K dataset
    dataset = load_dataset("gsm8k", "main", split="train")
    
    # Define reward function
    def correct_answer(answer: str, completion: vf.Messages) -> float:
        """Check if the answer appears in the completion."""
        completion_text = str(completion)
        return 1.0 if answer in completion_text else 0.0
    
    return QAEnv(
        dataset=dataset,
        rubric=vf.Rubric(correct_answer),
        system_prompt="Solve the following math problem."
    )

# Usage
env = load_environment()
results = await env.evaluate(
    client=vf.ClientConfig(
        provider="openai",
        api_key="sk-..."
    ),
    model="gpt-4",
    num_examples=100,
    rollouts_per_example=1
)

print(f"Accuracy: {results['metadata']['avg_reward']:.2%}")

Custom Dataset Format

import verifiers as vf

def load_environment():
    # Create custom dataset
    dataset = vf.Environment.make_dataset(
        [
            {
                "question": "What is 2+2?",
                "answer": "4",
            },
            {
                "question": "What is the capital of France?",
                "answer": "Paris",
            },
        ]
    )
    
    def exact_match(answer: str, completion: vf.Messages) -> float:
        completion_text = str(completion).strip()
        return 1.0 if answer.lower() in completion_text.lower() else 0.0
    
    return vf.SingleTurnEnv(
        dataset=dataset,
        rubric=vf.Rubric(exact_match),
        system_prompt="Answer the following question concisely."
    )

With Multiple Metrics

import verifiers as vf
from datasets import load_dataset

def load_environment():
    dataset = load_dataset("gsm8k", "main", split="train")
    
    def correctness(answer: str, completion: vf.Messages) -> float:
        """Primary reward: answer correctness."""
        return 1.0 if answer in str(completion) else 0.0
    
    def response_length(completion: vf.Messages) -> int:
        """Metric: response length in characters."""
        return len(str(completion))
    
    def has_explanation(completion: vf.Messages) -> float:
        """Metric: whether response includes explanation keywords."""
        text = str(completion).lower()
        keywords = ["because", "therefore", "since", "so"]
        return 1.0 if any(kw in text for kw in keywords) else 0.0
    
    return vf.SingleTurnEnv(
        dataset=dataset,
        rubric=vf.Rubric(
            correctness,  # Primary reward
            response_length,
            has_explanation,
        ),
        system_prompt="Solve the problem and explain your reasoning."
    )

With Few-Shot Examples

import verifiers as vf
from datasets import load_dataset

def load_environment():
    dataset = load_dataset("gsm8k", "main", split="train")
    
    # Few-shot examples
    few_shot = [
        {
            "role": "user",
            "content": "What is 10 + 5?"
        },
        {
            "role": "assistant",
            "content": "The answer is 15."
        },
        {
            "role": "user",
            "content": "What is 20 - 3?"
        },
        {
            "role": "assistant",
            "content": "The answer is 17."
        },
    ]
    
    def correct_answer(answer: str, completion: vf.Messages) -> float:
        return 1.0 if answer in str(completion) else 0.0
    
    return vf.SingleTurnEnv(
        dataset=dataset,
        rubric=vf.Rubric(correct_answer),
        system_prompt="Solve the following math problem.",
        few_shot=few_shot
    )

Common Patterns

Basic Q&A

Use SingleTurnEnv directly with a dataset and rubric:
env = vf.SingleTurnEnv(
    dataset=dataset,
    rubric=vf.Rubric(reward_fn),
    system_prompt="..."
)

Custom Scoring

Define reward functions that accept fields from your dataset:
def reward_fn(answer: str, completion: vf.Messages) -> float:
    # Access dataset fields by parameter name
    return compute_score(answer, completion)

rubric = vf.Rubric(reward_fn)

Multiple Rollouts per Example

Generate multiple responses per question for majority voting:
results = await env.evaluate(
    client=client,
    model="gpt-4",
    num_examples=100,
    rollouts_per_example=5,  # Generate 5 responses per question
    sampling_args={"temperature": 0.7}
)

When to Use

Use SingleTurnEnv for:
  • Question answering
  • Text classification
  • Summarization
  • Translation
  • Any task requiring a single model response
For tasks requiring multiple turns of interaction, use MultiTurnEnv instead.

See Also

Build docs developers (and LLMs) love