Data Utilities

Utility functions for loading and formatting common benchmark datasets.

These utilities are designed for example datasets and quick prototyping, not core functionality. For production use, load datasets directly using HuggingFace datasets library.

Overview

The verifiers.utils.data_utils module provides:

Pre-configured loaders for common benchmarks (GSM8K, MATH, GPQA, etc.)
Dataset formatting helpers
Answer extraction utilities
System prompts for math tasks

Dataset Loaders

load_example_dataset

def load_example_dataset(
    name: str = "gsm8k",
    split: str | None = None,
    n: int | None = None,
    seed: int = 0
) -> Dataset

Load a preprocessed benchmark dataset.

name

str

default:"gsm8k"

Dataset name. Supported: "aime2024", "aime2025", "amc2023", "gpqa_diamond", "gpqa_main", "gsm8k", "math", "math500", "mmlu", "mmlu_pro", "openbookqa", "openrs", "openrs_easy", "openrs_hard", "prime_code".

split

str | None

default:"None"

Dataset split. If None, uses the default split for that dataset (usually “test” or “train”).

int | None

default:"None"

Number of examples to load. If None, loads all examples.

seed

int

default:"0"

Random seed for shuffling when n is specified.

Returns: HuggingFace Dataset with question and answer columns. Example:

from verifiers.utils.data_utils import load_example_dataset

# Load 100 GSM8K examples
dataset = load_example_dataset("gsm8k", n=100)

# Load all MATH problems
math_dataset = load_example_dataset("math", split="train")

# Load GPQA diamond
gpqa = load_example_dataset("gpqa_diamond")

Formatting Functions

format_dataset

def format_dataset(
    dataset: Dataset,
    system_prompt: str | None = None,
    few_shot: Messages | None = None,
    question_key: str = "question",
    answer_key: str = "answer",
    map_kwargs: dict = {},
) -> Dataset

Add example_id and prompt columns to a dataset.

dataset

Dataset

required

Input dataset to format.

system_prompt

str | None

System prompt to prepend to all prompts.

few_shot

Messages | None

Few-shot examples to include before each question.

question_key

str

default:"question"

Column name containing questions.

answer_key

str

default:"answer"

Column name containing answers.

map_kwargs

dict

Additional arguments passed to dataset.map().

Returns: Dataset with example_id and prompt columns. Example:

from verifiers.utils.data_utils import format_dataset, BOXED_SYSTEM_PROMPT
from datasets import load_dataset

# Load raw dataset
raw_dataset = load_dataset("gsm8k", "main", split="test")

# Format with system prompt
formatted = format_dataset(
    raw_dataset,
    system_prompt=BOXED_SYSTEM_PROMPT,
    question_key="question",
    answer_key="answer",
)

# Now has 'prompt' column with messages
print(formatted[0]["prompt"])
# [
#   {"role": "system", "content": "Please reason step by step..."},
#   {"role": "user", "content": "What is 2+2?"}
# ]

Answer Extraction

extract_boxed_answer

def extract_boxed_answer(text: str) -> str

Extract content from LaTeX \boxed{} commands. Finds the last occurrence of \boxed{...} in the text and returns the content between matching braces. If no boxed answer is found or braces don’t match, returns the original text.

text

str

required

Text containing LaTeX boxed answer (e.g., "\boxed{42}").

Returns: str - Extracted answer content, or original text if no valid boxed answer found. Example:

from verifiers.utils.data_utils import extract_boxed_answer

# Simple answer
text = "The answer is \\boxed{42}"
result = extract_boxed_answer(text)  # "42"

# Nested braces
text = "\\boxed{x = \\frac{1}{2}}"
result = extract_boxed_answer(text)  # "x = \\frac{1}{2}"

# Multiple boxed answers - extracts last one
text = "First \\boxed{A}, then \\boxed{B}"
result = extract_boxed_answer(text)  # "B"

# No boxed answer
text = "Just plain text"
result = extract_boxed_answer(text)  # "Just plain text"

extract_hash_answer

def extract_hash_answer(text: str) -> str

Extract answer after #### delimiter (GSM8K format). Returns the text after the first #### marker, stripped of leading/trailing whitespace. If no delimiter is found, returns the original text.

text

str

required

Text containing hash-delimited answer (e.g., "Solution here\n#### 42").

Returns: str - Answer after #### delimiter, or original text if no delimiter found. Example:

from verifiers.utils.data_utils import extract_hash_answer

# Standard GSM8K format
text = "Step 1: Add them.\nStep 2: Get result.\n#### 42"
result = extract_hash_answer(text)  # "42"

# With spaces
text = "Solution goes here #### 100"
result = extract_hash_answer(text)  # "100"

# No delimiter
text = "Just an answer"
result = extract_hash_answer(text)  # "Just an answer"

strip_non_numeric

def strip_non_numeric(text: str) -> str

Remove all non-numeric characters except periods. Example:

from verifiers.utils.data_utils import strip_non_numeric

text = "The answer is $42.5"
result = strip_non_numeric(text)  # "42.5"

System Prompts

BOXED_SYSTEM_PROMPT

BOXED_SYSTEM_PROMPT = "Please reason step by step, and put your final answer within \\boxed{}."

Standard prompt for math problems requiring boxed answers.

THINK_BOXED_SYSTEM_PROMPT

THINK_BOXED_SYSTEM_PROMPT = "Think step-by-step inside <think>...</think> tags. Then, give your final answer inside \\boxed{}."

Prompt encouraging explicit reasoning in XML tags. Example:

import verifiers as vf
from verifiers.utils.data_utils import (
    load_example_dataset,
    format_dataset,
    BOXED_SYSTEM_PROMPT,
)

def load_environment():
    # Load and format dataset
    dataset = load_example_dataset("math", n=100)
    dataset = format_dataset(
        dataset,
        system_prompt=BOXED_SYSTEM_PROMPT,
    )
    
    def correct(answer: str, completion: str, **kwargs) -> float:
        from verifiers.utils.data_utils import extract_boxed_answer
        extracted = extract_boxed_answer(completion)
        return 1.0 if extracted == answer else 0.0
    
    return vf.SingleTurnEnv(
        dataset=dataset,
        rubric=vf.Rubric(correct),
    )

Supported Datasets

aime2024

AIME 2024 math competition (15 problems)

aime2025

AIME 2025 math competition (30 problems, AIME I + II)

amc2023

AMC 2023 math competition

gpqa_diamond

GPQA Diamond subset (high-quality questions)

gpqa_main

GPQA Main dataset

gsm8k

Grade School Math 8K dataset

math

MATH competition dataset

math500

MATH-500 subset

mmlu

Massive Multitask Language Understanding

mmlu_pro

MMLU-Pro (harder variant)

openbookqa

OpenBookQA question answering

openrs / openrs_easy / openrs_hard

OpenRS reasoning problems

prime_code

Prime verifiable coding problems

Preprocessing Functions

Internal preprocessing functions used by load_example_dataset():

def get_preprocess_fn(name: str) -> Callable[[dict], dict]

Returns a preprocessing function for the named dataset. Each preprocessor:

Extracts question and answer fields
Normalizes format (e.g., strips #### delimiters)
Handles dataset-specific quirks

These are internal functions. Use load_example_dataset() instead of calling preprocessors directly.

Custom Dataset Example

import verifiers as vf
from verifiers.utils.data_utils import format_dataset
from datasets import Dataset

# Create custom dataset
raw_data = [
    {"question": "What is 2+2?", "answer": "4"},
    {"question": "What is 10*5?", "answer": "50"},
]

dataset = Dataset.from_list(raw_data)

# Format with system prompt
formatted = format_dataset(
    dataset,
    system_prompt="Solve the math problem.",
)

# Use in environment
env = vf.SingleTurnEnv(
    dataset=formatted,
    rubric=vf.Rubric(lambda answer, completion, **kw: 1.0 if answer in completion else 0.0),
)

Environment Classes

Rubrics & Parsers

Clients

Integration Classes

Experimental

Data Types

Utilities

Data Utilities

Data Utilities

Overview

Dataset Loaders

load_example_dataset

Formatting Functions

format_dataset

Answer Extraction

extract_boxed_answer

extract_hash_answer

strip_non_numeric

System Prompts

BOXED_SYSTEM_PROMPT

THINK_BOXED_SYSTEM_PROMPT

Supported Datasets

Preprocessing Functions

Custom Dataset Example

See Also

Build docs developers (and LLMs) love

Environment Classes

Rubrics & Parsers

Clients

Integration Classes

Experimental

Data Types

Utilities

​Data Utilities

​Overview

​Dataset Loaders

​load_example_dataset

​Formatting Functions

​format_dataset

​Answer Extraction

​extract_boxed_answer

​extract_hash_answer

​strip_non_numeric

​System Prompts

​BOXED_SYSTEM_PROMPT

​THINK_BOXED_SYSTEM_PROMPT

​Supported Datasets

​Preprocessing Functions

​Custom Dataset Example

​See Also

Build docs developers (and LLMs) love

Data Utilities

Overview

Dataset Loaders

load_example_dataset

Formatting Functions

format_dataset

Answer Extraction

extract_boxed_answer

extract_hash_answer

strip_non_numeric

System Prompts

BOXED_SYSTEM_PROMPT

THINK_BOXED_SYSTEM_PROMPT

Supported Datasets

Preprocessing Functions

Custom Dataset Example

See Also