Skip to main content

TextArenaEnv

Wrapper environment for TextArena text-based games.

Overview

TextArenaEnv wraps TextArena game environments for multi-turn interaction with language models. It automatically converts TextArena games into Verifiers datasets and handles game state management. Key features:
  • Automatic dataset generation from TextArena word lists
  • Efficient memory sharing for parallel rollouts
  • Custom feedback transformation via feedback_fn
  • Built-in XML parser for structured responses

Installation

Install with TextArena support:
uv add 'verifiers[ta]'
Or when developing in the verifiers repo:
uv sync --extra ta
See the TextArena integration guide for setup details.

Inheritance

Environment
└── MultiTurnEnv
    └── TextArenaEnv

Constructor

TextArenaEnv(
    game: str = "Wordle-v0",
    num_train_examples: int = 1000,
    num_eval_examples: int = 0,
    system_prompt: str | None = None,
    parser: vf.XMLParser | None = None,
    rubric: vf.Rubric | None = None,
    feedback_fn: Callable[[str], str] = lambda x: x,
    seed: int = 0,
    **kwargs
)

Parameters

game
str
default:"Wordle-v0"
TextArena game ID (e.g., “Wordle-v0”, “TwentyQuestions-v0”).
num_train_examples
int
default:"1000"
Number of training examples to generate.
num_eval_examples
int
default:"0"
Number of evaluation examples to generate.
system_prompt
str | None
default:"None"
System prompt for the model. If None, uses default from MultiTurnEnv.
parser
vf.XMLParser | None
default:"None"
Parser for model responses. If None, uses XMLParser(fields=["think", "guess"], answer_field="guess").
rubric
vf.Rubric | None
default:"None"
Rubric for scoring. If None, uses default rubric.
feedback_fn
Callable[[str], str]
default:"lambda x: x"
Function to transform TextArena observations before presenting to the model. Use this to filter or reformat game state messages.
seed
int
default:"0"
Random seed for dataset generation.
**kwargs
Any
Additional arguments passed to MultiTurnEnv.

Key Methods

setup_state

async def setup_state(
    state: vf.State,
    **kwargs
) -> vf.State
Initialize TextArena environment for this rollout. Implementation details:
  • Creates a deep copy of the TextArena environment with memory sharing optimization
  • Sets the secret word from state["answer"]
  • Stores environment in state["ta_env"]

env_response

async def env_response(
    messages: vf.Messages,
    state: vf.State,
    **kwargs
) -> vf.Messages
Process model’s guess and return game feedback. Flow:
  1. Parse guess from latest message using parser.parse_answer()
  2. Step the TextArena environment with the guess
  3. If game is done, set state["final_env_response"] and return terminal message
  4. Otherwise, get observation and apply feedback_fn before returning

cleanup_ta_env

@vf.cleanup
async def cleanup_ta_env(state: vf.State)
Clean up TextArena environment after rollout by removing ta_env from state.

Example Usage

Basic Wordle Environment

import verifiers as vf
from verifiers.envs.integrations.textarena_env import TextArenaEnv

def load_environment():
    return TextArenaEnv(
        game="Wordle-v0",
        num_train_examples=1000,
        num_eval_examples=100,
        seed=0,
    )

Custom Feedback Function

import verifiers as vf
from verifiers.envs.integrations.textarena_env import TextArenaEnv

def simplify_feedback(observation: str) -> str:
    """Transform TextArena observation to simpler format."""
    # TextArena often returns full game history,
    # but we only want the latest feedback
    if "Correct!" in observation:
        return "Correct!"
    elif "letters in correct positions" in observation:
        # Extract just the color hints
        lines = observation.split("\n")
        return lines[-1]  # Return just the hint line
    return observation

def load_environment():
    return TextArenaEnv(
        game="Wordle-v0",
        feedback_fn=simplify_feedback,
        num_train_examples=1000,
        system_prompt="You are playing Wordle. Guess 5-letter words. Use <think> tags for reasoning and <guess> for your answer.",
    )

Custom Parser and Rubric

import verifiers as vf
from verifiers.envs.integrations.textarena_env import TextArenaEnv

def load_environment():
    parser = vf.XMLParser(
        fields=["reasoning", "guess"],
        answer_field="guess"
    )
    
    def success_reward(state: vf.State) -> float:
        """Reward winning in fewer guesses."""
        num_guesses = len(state["trajectory"])
        if state.get("final_env_response"):
            # Won - reward fewer guesses
            return 1.0 / num_guesses
        return 0.0
    
    rubric = vf.Rubric(success_reward)
    
    return TextArenaEnv(
        game="Wordle-v0",
        parser=parser,
        rubric=rubric,
        num_train_examples=1000,
    )

TwentyQuestions Game

import verifiers as vf
from verifiers.envs.integrations.textarena_env import TextArenaEnv

def load_environment():
    parser = vf.XMLParser(
        fields=["think", "question"],
        answer_field="question"
    )
    
    return TextArenaEnv(
        game="TwentyQuestions-v0",
        parser=parser,
        num_train_examples=500,
        num_eval_examples=100,
        system_prompt="You are playing 20 Questions. Ask yes/no questions to guess the secret word. Use <think> for strategy and <question> for your question.",
    )

Memory Optimization

TextArenaEnv uses build_shared_memo() to share immutable data across environment copies:
  • Problem: TextArena’s EnglishDictionary holds ~430K strings in 4 sets (~38MB). Without sharing, each rollout copies this data (~120ms + 38MB per copy).
  • Solution: The shared memo dict allows deep copying to share these immutable objects, saving significant memory and time during parallel rollouts.
This optimization is automatic and requires no user configuration.

Available Games

Some popular TextArena games:
  • Wordle-v0 - Classic word guessing game
  • TwentyQuestions-v0 - 20 questions game
  • Poker-v0 - Poker game
  • Many more available in the TextArena repository
Check the TextArena documentation for the full list of available games.

See Also

Build docs developers (and LLMs) love