Skip to main content
Single-turn environments are the simplest type of environment in Verifiers, designed for tasks where the model provides a single response to each prompt. They’re ideal for Q&A tasks, math problems, text transformations, and other one-shot challenges.

Overview

SingleTurnEnv is a specialized version of MultiTurnEnv with max_turns=1. Each rollout follows this simple pattern:
  1. Send the prompt to the model
  2. Receive a single response (the completion)
  3. Score the response using reward functions
No multi-turn interaction, no tools, no complex state management—just prompt, response, and reward.

Your First Environment

Here’s a minimal single-turn environment for math problems:
import verifiers as vf
from datasets import Dataset

def load_environment():
    # Define your task data
    dataset = Dataset.from_list([
        {"prompt": [{"role": "user", "content": "What is 2+2?"}], "answer": "4"},
        {"prompt": [{"role": "user", "content": "What is 3*5?"}], "answer": "15"},
    ])
    
    # Define your reward function
    async def correct_answer(completion, answer) -> float:
        response = completion[-1]["content"]
        return 1.0 if answer in response else 0.0
    
    # Create rubric and environment
    rubric = vf.Rubric(funcs=[correct_answer])
    return vf.SingleTurnEnv(dataset=dataset, rubric=rubric)
1
Initialize Your Environment
2
Create a new environment project:
3
prime env init my-math-env
cd environments/my_math_env
4
Build Your Dataset
5
You can build datasets in several ways:
6
Direct Prompts
from datasets import Dataset

dataset = Dataset.from_list([
    {
        "prompt": [{"role": "user", "content": "What is 2+2?"}],
        "answer": "4"
    },
])
The prompt field contains a list of messages ready to send to the model.
Question Column
dataset = Dataset.from_list([
    {"question": "What is 2+2?", "answer": "4"},
])
The environment automatically wraps question strings in a user message.
From Hugging Face
from datasets import load_dataset

dataset = load_dataset("gsm8k", "main", split="train")
dataset = dataset.map(lambda x: {
    "question": x["question"],
    "answer": x["answer"],
})
Load existing datasets and map to the expected format.
7
Define Reward Functions
8
Reward functions score model responses. They request data by naming arguments:
9
async def correct_answer(completion, answer) -> float:
    """Check if the answer appears in the response."""
    response = completion[-1]["content"]
    return 1.0 if answer in response else 0.0
10
Available arguments:
11
  • completion — model’s output (list of messages)
  • prompt — input messages
  • answer — from dataset row
  • info — structured metadata from dataset
  • state — full rollout state
  • 12
    Create Your Environment
    13
    Combine everything in load_environment():
    14
    import verifiers as vf
    from datasets import load_dataset
    
    def load_environment():
        dataset = load_dataset("gsm8k", "main", split="train")
        
        async def correct_answer(completion, answer) -> float:
            response = completion[-1]["content"]
            return 1.0 if answer in response else 0.0
        
        rubric = vf.Rubric(funcs=[correct_answer])
        
        return vf.SingleTurnEnv(
            dataset=dataset,
            system_prompt="You are a helpful math tutor.",
            rubric=rubric,
        )
    
    15
    Install and Test
    16
    Install your environment and run a quick evaluation:
    17
    prime env install my-math-env
    prime eval run my-math-env -m gpt-4.1-mini -n 5
    
    18
    Expected output:
    19
    Running evaluation on my-math-env with gpt-4.1-mini
    Progress: 5/5 examples, 15/15 rollouts
    Reward: 0.87 ± 0.12
    

    Real Example: Text Reversal

    Let’s examine the reverse-text environment from the repository:
    environments/reverse_text/reverse_text.py
    from datasets import load_dataset
    import verifiers as vf
    
    def load_environment(
        dataset_name: str = "PrimeIntellect/Reverse-Text-RL",
        dataset_split: str = "train",
        system_prompt: str | None = "Reverse the text character-by-character. Put your answer in <reversed_text> tags.",
    ) -> vf.Environment:
        train_dataset = load_dataset(dataset_name, split=dataset_split).map(
            lambda x: {
                "question": x["prompt"],
                "answer": x["prompt"][::-1],
                "info": {},
                "task": "reverse-text",
            }
        )
        train_dataset = train_dataset.remove_columns(["prompt"])
    
        parser = vf.XMLParser(["reversed_text"], answer_field="reversed_text")
    
        def lcs_reward_func(completion, answer, **kwargs) -> float:
            """LCS ratio of the reversed prompt and the parsed completion."""
            from difflib import SequenceMatcher
            response = parser.parse_answer(completion) or ""
            return SequenceMatcher(None, response, answer).ratio()
    
        rubric = vf.Rubric(funcs=[lcs_reward_func], weights=[1.0])
    
        return vf.SingleTurnEnv(
            dataset=train_dataset,
            system_prompt=system_prompt,
            parser=parser,
            rubric=rubric,
        )
    
    Key features:
    • Uses XMLParser to extract structured output from <reversed_text> tags
    • Computes continuous reward based on longest common subsequence
    • Allows customization via system_prompt parameter

    Advanced Patterns

    Multiple Reward Functions

    Combine multiple scoring criteria with custom weights:
    async def check_keywords(completion, info) -> float:
        """Check for required keywords."""
        response = completion[-1]["content"]
        keywords = info["required_keywords"]
        found = sum(1 for kw in keywords if kw.lower() in response.lower())
        return found / len(keywords)
    
    async def length_reward(completion) -> float:
        """Reward concise responses."""
        response = completion[-1]["content"]
        return 1.0 if len(response) < 500 else 0.5
    
    rubric = vf.Rubric(
        funcs=[check_keywords, length_reward],
        weights=[1.0, 0.1]  # keyword match is primary, length is secondary
    )
    
    The final reward is the weighted sum: reward = 1.0 * check_keywords + 0.1 * length_reward

    Parsing Structured Output

    Use parsers to extract specific fields from model responses:
    parser = vf.XMLParser(["reasoning", "answer"], answer_field="answer")
    
    async def correct_with_reasoning(completion, answer, parser) -> float:
        parsed = parser.parse_answer(completion)
        # Access parsed.reasoning and parsed.answer
        return 1.0 if answer in parsed.answer else 0.0
    
    rubric = vf.Rubric(funcs=[correct_with_reasoning], parser=parser)
    vf_env = vf.SingleTurnEnv(dataset=dataset, parser=parser, rubric=rubric)
    

    Lazy Dataset Loading

    For large datasets, defer loading until first access:
    from datasets import load_dataset
    import verifiers as vf
    
    def get_dataset_builder(split: str = "train", seed: int = 42):
        """Returns a builder that lazily loads the dataset."""
        def build():
            ds = load_dataset("my-dataset", split=split)
            ds = ds.shuffle(seed=seed)
            return ds
        return build
    
    def load_environment():
        dataset_builder = get_dataset_builder(split="train")
        eval_builder = get_dataset_builder(split="test")
        
        return vf.SingleTurnEnv(
            dataset=dataset_builder,      # built on first access
            eval_dataset=eval_builder,    # built on first access
            rubric=rubric,
        )
    
    Benefits:
    • Avoid loading large datasets during environment initialization
    • Better performance when running multiple replicas
    • Parameterize dataset creation (splits, shuffling, filtering)

    Metrics and Observability

    Track additional metrics without affecting the reward:
    async def response_length(completion) -> float:
        return float(len(completion[-1]["content"]))
    
    async def has_reasoning(completion) -> float:
        content = completion[-1]["content"]
        return 1.0 if "because" in content.lower() else 0.0
    
    rubric = vf.Rubric(funcs=[correct_answer])  # only this affects reward
    rubric.add_metric(response_length)          # weight=0 (tracking only)
    rubric.add_metric(has_reasoning)            # weight=0 (tracking only)
    
    All metrics appear in evaluation results:
    {
      "reward": 0.8,
      "correct_answer": 0.8,
      "response_length": 127.3,
      "has_reasoning": 0.6
    }
    

    Evaluation Datasets

    Provide separate train and evaluation datasets:
    def load_environment():
        train_dataset = load_dataset("my-dataset", split="train")
        eval_dataset = load_dataset("my-dataset", split="test")
        
        return vf.SingleTurnEnv(
            dataset=train_dataset,
            eval_dataset=eval_dataset,
            rubric=rubric,
        )
    
    When you run prime eval run, the evaluation dataset is used automatically.

    Common Patterns

    Math Verification

    Use symbolic math checking with the built-in MathRubric:
    import verifiers as vf
    
    def extract_boxed_answer(completion):
        import re
        match = re.search(r'\\boxed\{(.+?)\}', completion[-1]["content"])
        return match.group(1) if match else ""
    
    parser = vf.Parser(extract_fn=extract_boxed_answer)
    math_rubric = vf.MathRubric(parser=parser)  # Uses math-verify library
    
    vf_env = vf.SingleTurnEnv(
        dataset=dataset,
        system_prompt="Solve the problem and put your answer in \\boxed{}.",
        parser=parser,
        rubric=math_rubric,
    )
    

    LLM-as-Judge

    Use another LLM to score responses:
    import verifiers as vf
    
    judge_rubric = vf.JudgeRubric(
        judge_model="gpt-4.1-mini",
        judge_prompt="""Is this response correct?
        
        Question: {question}
        Ground truth: {answer}
        Response: {response}
        
        Answer 'yes' or 'no'."""
    )
    
    async def judge_reward(prompt, completion, answer, judge) -> float:
        verdict = await judge(prompt, completion, answer)
        return 1.0 if "yes" in verdict.lower() else 0.0
    
    judge_rubric.add_reward_func(judge_reward)
    
    vf_env = vf.SingleTurnEnv(dataset=dataset, rubric=judge_rubric)
    

    Combining Multiple Rubrics

    Use RubricGroup to combine different scoring approaches:
    # Symbolic math verification
    math_rubric = vf.MathRubric()
    
    # LLM judge for reasoning quality
    judge_rubric = vf.JudgeRubric(judge_model="gpt-4.1-mini")
    judge_rubric.add_reward_func(judge_reasoning_quality, weight=0.5)
    
    # Combine both
    rubric = vf.RubricGroup([math_rubric, judge_rubric])
    
    vf_env = vf.SingleTurnEnv(dataset=dataset, rubric=rubric)
    
    Final reward = math_rubric.reward + judge_rubric.reward

    Testing Your Environment

    After implementing your environment:
    1
    Install locally
    2
    prime env install my-env
    
    3
    Run a quick evaluation
    4
    prime eval run my-env -m gpt-4.1-mini -n 10 -r 3
    
    5
    This runs 10 examples with 3 rollouts each (30 total rollouts).
    6
    Check the output
    7
    Expected output:
    8
    Loading environment: my-env
    Running 10 examples × 3 rollouts = 30 total rollouts
    Progress: ████████████████████ 30/30 (100%)
    
    Results:
      Reward: 0.73 ± 0.15
      correct_answer: 0.73 ± 0.15
      response_length: 142.3 ± 45.2
    
    9
    Save and inspect results
    10
    prime eval run my-env -m gpt-4.1-mini -n 10 -s
    
    11
    Results saved to ./environments/my_env/outputs/evals/my-env--gpt-4.1-mini/{run_id}/:
    12
  • results.jsonl - detailed rollout data
  • metadata.json - configuration and metrics
  • Next Steps

    Build docs developers (and LLMs) love