TextArenaEnv integration wraps TextArena text-based game environments for multi-turn interaction with language models.
TextArena provides competitive and collaborative text-based games designed for LLM evaluation.
Features
- Text-based games - Wordle, 20 Questions, Poker, and more
- Multi-turn interaction - Games require multiple model responses
- Efficient memory sharing - Optimized for parallel rollouts
- Custom feedback - Transform game observations for better prompting
- XML formatting - Built-in parser for structured responses
Installation
Install with TextArena support:textarena- TextArena game librarynltk- Natural language processing (for word games)
Quick Start
Available Games
TextArena provides several game types:Word Games
Wordle-v0- Classic Wordle gameWordChain-v0- Word association chainsScrabble-v0- Scrabble with simplified rules
Logic Games
TwentyQuestions-v0- Guess the objectMastermind-v0- Code-breaking game
Strategy Games
Chess-v0- Text-based chessGo-v0- Text-based GoPoker-v0- Texas Hold’em
Configuration
Basic Configuration
Custom Parser
By default, TextArena usesXMLParser with <think> and <guess> fields:
Custom System Prompt
Custom Feedback Function
TextArena games return full game state, but you may want to render only the delta. Usefeedback_fn to transform observations:
Verifiers doesn’t allow overwriting past messages—only appending. TextArena games often return full game state rather than turn-level diffs, so
feedback_fn is useful for rendering clean, incremental feedback.Custom Rubric
By default, the game’s built-in reward is used. Override with a custom rubric:Full Example
Expected Format
Models should respond with XML-formatted guesses:Performance Optimization
TextArenaEnv includes memory optimization for parallel rollouts:
Game-Specific Notes
Wordle
- Words are randomly selected from the TextArena word list
- Default max turns: 6
- Reward is based on number of guesses (fewer is better)
TwentyQuestions
- Model asks yes/no questions to guess the object
- Limited to 20 questions
- Reward for correct guess within question limit
Chess
- Moves in algebraic notation (e.g., “e2e4”)
- Game state includes board representation
- Reward based on game outcome
Metrics
| Metric | Meaning |
|---|---|
reward | Game reward (task-specific) |
num_turns | Number of turns taken |
format_reward | XML format compliance (if parser used) |
Best Practices
When wrapping new TextArena games, investigate the source code to understand the observation format. Many games return full state rather than turn-level diffs.
- Use feedback_fn - Transform full-state observations to incremental feedback
- Test locally first - Try a few games manually to understand difficulty
- Validate parsing - Ensure your parser extracts the right fields
- Custom prompts - Game-specific instructions improve performance
- Seed consistency - Use same seed for reproducible experiments
Troubleshooting
NLTK Download Errors
TextArena uses NLTK for word games. If you see download errors, the environment handles this automatically. If issues persist:Invalid Moves
If the model makes invalid moves (e.g., non-existent words in Wordle):- Improve the system prompt with game rules
- Add examples of valid moves in few-shot prompts
- Use a more capable model
Memory Issues
For large-scale parallel rollouts:- The environment automatically shares immutable data
- If still seeing issues, reduce
num_train_examples - Consider running evaluation in batches