ReasoningGymEnv
Wrapper environment for Reasoning Gym procedural reasoning tasks.Overview
ReasoningGymEnv wraps Reasoning Gym datasets for use in Verifiers. It supports both single datasets and composite mixtures, automatically handles scoring via Reasoning Gym’s built-in evaluators, and provides procedurally generated tasks.
Key features:
- Procedural task generation via seeds
- Support for all Reasoning Gym datasets
- Composite dataset mixing with custom weights
- Automatic task-specific scoring
- Built-in XML parser for structured responses
Installation
Install with Reasoning Gym support:Inheritance
Constructor
Parameters
Dataset specification. Can be:
- String: Single dataset name (e.g.,
"arc_1d") - List of strings: Multiple datasets with equal weights
- List of dicts: Datasets with custom weights and configs using
DatasetSpecformat
Number of training examples to generate.
Number of evaluation examples to generate.
System prompt for the model. Defaults to Reasoning Gym’s default prompt.
Parser for model responses. If None, uses
XMLParser(fields=["answer"]).Random seed for procedural generation.
Key Methods
build_rg_dataset
- String: Single dataset →
rg.create_dataset(gym, size=total_examples, seed=seed) - List of strings: Multiple datasets with equal weights (1.0 each)
- List of dicts: Datasets with custom
DatasetSpecconfigurations
rg_to_hf
question: Task prompt from Reasoning Gymanswer: Index as string (used to retrieve entry for scoring)task: Source dataset name from metadata
Scoring
ReasoningGymEnv automatically creates a rubric with a custom reward function:Example Usage
Single Dataset
Multiple Datasets (Equal Weights)
Composite Dataset (Custom Weights)
Custom Parser
With Custom System Prompt
Available Datasets
Reasoning Gym provides many procedural datasets. Some popular ones:- Pattern Recognition:
arc_1d,arc_2d - Math:
gsm8k,math_count,number_theory - Logic:
boolean_logic,propositional_logic - Sequences:
sequence_next,sequence_missing - Spatial:
grid_navigation,spatial_reasoning
DatasetSpec Format
When using composite datasets with custom weights, use this format:Procedural Generation
All tasks are procedurally generated using seeds:- Each example gets a unique seed:
seed + index - Same seed always generates the same task
- Infinite variations possible
- Reproducible across runs
See Also
- Reasoning Gym Integration Guide - Setup and dataset details
- SingleTurnEnv - Base class documentation
- XMLParser - Parser for structured responses
- Rubric - Reward function configuration