Data Utilities
Utility functions for loading and formatting common benchmark datasets.These utilities are designed for example datasets and quick prototyping, not core functionality. For production use, load datasets directly using HuggingFace
datasets library.Overview
Theverifiers.utils.data_utils module provides:
- Pre-configured loaders for common benchmarks (GSM8K, MATH, GPQA, etc.)
- Dataset formatting helpers
- Answer extraction utilities
- System prompts for math tasks
Dataset Loaders
load_example_dataset
Dataset name. Supported:
"aime2024", "aime2025", "amc2023", "gpqa_diamond", "gpqa_main", "gsm8k", "math", "math500", "mmlu", "mmlu_pro", "openbookqa", "openrs", "openrs_easy", "openrs_hard", "prime_code".Dataset split. If None, uses the default split for that dataset (usually “test” or “train”).
Number of examples to load. If None, loads all examples.
Random seed for shuffling when
n is specified.Dataset with question and answer columns.
Example:
Formatting Functions
format_dataset
example_id and prompt columns to a dataset.
Input dataset to format.
System prompt to prepend to all prompts.
Few-shot examples to include before each question.
Column name containing questions.
Column name containing answers.
Additional arguments passed to
dataset.map().example_id and prompt columns.
Example:
Answer Extraction
extract_boxed_answer
\boxed{} commands. Finds the last occurrence of \boxed{...} in the text and returns the content between matching braces. If no boxed answer is found or braces don’t match, returns the original text.
Text containing LaTeX boxed answer (e.g.,
"\boxed{42}").str - Extracted answer content, or original text if no valid boxed answer found.
Example:
extract_hash_answer
#### delimiter (GSM8K format). Returns the text after the first #### marker, stripped of leading/trailing whitespace. If no delimiter is found, returns the original text.
Text containing hash-delimited answer (e.g.,
"Solution here\n#### 42").str - Answer after #### delimiter, or original text if no delimiter found.
Example:
strip_non_numeric
System Prompts
BOXED_SYSTEM_PROMPT
THINK_BOXED_SYSTEM_PROMPT
Supported Datasets
aime2024
AIME 2024 math competition (15 problems)
aime2025
AIME 2025 math competition (30 problems, AIME I + II)
amc2023
AMC 2023 math competition
gpqa_diamond
GPQA Diamond subset (high-quality questions)
gpqa_main
GPQA Main dataset
gsm8k
Grade School Math 8K dataset
math
MATH competition dataset
math500
MATH-500 subset
mmlu
Massive Multitask Language Understanding
mmlu_pro
MMLU-Pro (harder variant)
openbookqa
OpenBookQA question answering
openrs / openrs_easy / openrs_hard
OpenRS reasoning problems
prime_code
Prime verifiable coding problems
Preprocessing Functions
Internal preprocessing functions used byload_example_dataset():
- Extracts
questionandanswerfields - Normalizes format (e.g., strips #### delimiters)
- Handles dataset-specific quirks
These are internal functions. Use
load_example_dataset() instead of calling preprocessors directly.Custom Dataset Example
See Also
- Environment.make_dataset() - Create datasets from dicts
- HuggingFace Datasets - Dataset library documentation