Overview
The Wordle environment provides:- Game: 5-letter word guessing with color-coded feedback
- Format: Multi-turn interaction (up to 6 guesses)
- Parsing: XML tags for structured guess extraction
- Rewards: Correctness, efficiency bonus, and partial credit
- Integration: TextArena game library
Complete Implementation
Here’s the full working implementation fromenvironments/wordle/wordle.py:
How It Works
1. TextArena Integration
TextArenaEnv wraps TextArena games for RL training:
- Generates random 5-letter target words
- Accepts guesses and returns color-coded feedback
- Tracks game state (remaining guesses, history)
2. Structured Output Parsing
XMLParser extracts guesses from model responses:
3. Feedback Processing
Thewordle_feedback_fn extracts game feedback:
G= Green (correct letter, correct position)Y= Yellow (correct letter, wrong position)_= Gray (letter not in word)
4. Multi-Part Reward Function
Three reward components encourage different behaviors:- Correctness
- Efficiency Bonus
- Partial Credit
- Format Compliance
Combined Reward
Example Game Trajectory
- Successful Game
- Failed Game
- Format Error
Target word: STEAK
Turn 1Assistant: I’ll start with a common word to test vowels and common consonants.CRANE Environment:
Turn 2Assistant: A is in position 3 (wrong spot), E is in position 5 (wrong spot). Let me try:STEAL Environment:
Turn 3Assistant: Almost there! The first four letters are S-T-E-A. Last letter isn’t L. Let me try:STEAK Environment:Rewards:
Turn 1Assistant: I’ll start with a common word to test vowels and common consonants.
Turn 2Assistant: A is in position 3 (wrong spot), E is in position 5 (wrong spot). Let me try:
Turn 3Assistant: Almost there! The first four letters are S-T-E-A. Last letter isn’t L. Let me try:
correct_answer: 1.0length_bonus: 1.0 / 3 = 0.33partial_answer: 0.0 (already correct)format_reward: 1.0- Total: 2.53
Running the Environment
Installation
Quick Evaluation
Training Dataset
Configuration Options
| Parameter | Default | Description |
|---|---|---|
num_train_examples | 2000 | Number of training games |
num_eval_examples | 20 | Number of evaluation games |
system_prompt | DEFAULT_SYSTEM_PROMPT | Instructions for the model |
seed | 0 | Random seed for word generation |
Key Features
Structured Output with XMLParser
XMLParser provides:
- Extraction: Pulls content from XML tags
- Validation: Checks format compliance
- Format rewards: Built-in reward function for proper formatting
Multi-Component Rewards
Combining multiple reward signals:- Sparse signal (
correct_answer): Only 1.0 when winning - Dense signal (
partial_answer): Credit for progress - Efficiency (
length_bonus): Reward faster solutions - Compliance (
format_reward): Enforce output format
Game State Tracking
TextArenaEnv automatically tracks:- Number of guesses made
- Guess history
- Remaining attempts
- Win/loss status
Metrics Tracked
correct_answer: 1.0 if word guessed correctlylength_bonus: Efficiency bonus (0.0 to 1.0)partial_answer: Progress score (0.0 to 1.0)format_reward: Format compliance (0.0 or 1.0)reward: Combined weighted sumnum_turns: Number of guesses made
Advanced Usage
Custom Reward Weights
Adjust the importance of different reward components:Different Wordle Variants
TextArena supports multiple Wordle variants:Other TextArena Games
The same pattern works for other TextArena games:Related Examples
- GSM8K - Single-turn reasoning
- Math Python - Multi-turn with code execution
- Wiki Search - Multi-turn with custom tools
Next Steps
- Learn about MultiTurnEnv for game environments
- See Parsers for structured output extraction
- Explore Rubrics for custom reward design
- Check out TextArena for more games