Wordle Game Environment

This example demonstrates a multi-turn game environment where models play Wordle by making guesses and receiving feedback. It showcases parsing structured output, custom reward functions, and integration with TextArena.

Overview

The Wordle environment provides:

Game: 5-letter word guessing with color-coded feedback
Format: Multi-turn interaction (up to 6 guesses)
Parsing: XML tags for structured guess extraction
Rewards: Correctness, efficiency bonus, and partial credit
Integration: TextArena game library

Complete Implementation

Here’s the full working implementation from environments/wordle/wordle.py:

import re

import verifiers as vf
from verifiers.envs.integrations.textarena_env import TextArenaEnv

DEFAULT_SYSTEM_PROMPT = """You are a competitive game player. \
Make sure you read the game instructions carefully, and always follow the required format.

In each turn, think step-by-step, then give your guess inside <guess>...</guess> tags."""


### feedback functions
def wordle_feedback_fn(observation: str) -> str:
    latest_observation = observation.split("[GAME]")[-1].strip()
    if "Feedback:" in latest_observation:
        return latest_observation.split("Feedback:")[-1]
    else:
        return latest_observation


### reward functions
def correct_answer(parser, completion, answer, **kwargs) -> float:
    """Whether the guess is *exactly* correct."""
    guess = parser.parse_answer(completion)
    return 1.0 if guess == "[" + answer + "]" else 0.0


def length_bonus(parser, completion, answer, **kwargs) -> float:
    """Bonus for shorter correct solutions."""
    assistant_messages = parser.get_assistant_messages(completion)
    guesses = [
        x for x in assistant_messages if re.search(r"<guess>.*</guess>", x["content"])
    ]
    is_correct = correct_answer(parser, completion, answer, **kwargs)
    return is_correct / (len(guesses) or 1)


def partial_answer(parser, completion, answer, **kwargs) -> float:
    """Partial credit for the latest guess."""
    if correct_answer(parser, completion, answer, **kwargs):
        return 0.0
    user_messages = parser.get_user_messages(completion)
    for user_message in user_messages[::-1]:
        feedback = user_message["content"].strip()
        feedback_parts = feedback.split("\n")
        if len(feedback_parts) == 3:
            _, scoring, _ = feedback_parts
            scoring = scoring.strip()
            num_greens = scoring.count("G")
            num_yellows = scoring.count("Y")
            return 0.2 * num_greens + 0.1 * num_yellows
    return 0.0


### environment loader
def load_environment(
    num_train_examples: int = 2000,
    num_eval_examples: int = 20,
    system_prompt: str = DEFAULT_SYSTEM_PROMPT,
    seed: int = 0,
    **kwargs,
):
    parser = vf.XMLParser(fields=["guess"], answer_field="guess")

    rubric = vf.Rubric(parser=parser)
    rubric.add_reward_func(correct_answer)
    rubric.add_reward_func(partial_answer)
    rubric.add_reward_func(length_bonus)
    format_reward = parser.get_format_reward_func()
    format_reward.__name__ = "format_reward"
    rubric.add_reward_func(format_reward, weight=0.2)

    return TextArenaEnv(
        game="Wordle-v0",
        num_train_examples=num_train_examples,
        num_eval_examples=num_eval_examples,
        feedback_fn=wordle_feedback_fn,
        seed=seed,
        system_prompt=system_prompt,
        parser=parser,
        rubric=rubric,
        **kwargs,
    )

How It Works

1. TextArena Integration

TextArenaEnv wraps TextArena games for RL training:

return TextArenaEnv(
    game="Wordle-v0",
    num_train_examples=2000,
    num_eval_examples=20,
    feedback_fn=wordle_feedback_fn,
)

TextArena provides text-based game environments. The Wordle game:

Generates random 5-letter target words
Accepts guesses and returns color-coded feedback
Tracks game state (remaining guesses, history)

2. Structured Output Parsing

XMLParser extracts guesses from model responses:

parser = vf.XMLParser(fields=["guess"], answer_field="guess")

Example parsing:

model_output = "I'll try the word CRANE. <guess>CRANE</guess>"
parser.parse_answer(model_output)  # Returns: "[CRANE]"

3. Feedback Processing

The wordle_feedback_fn extracts game feedback:

def wordle_feedback_fn(observation: str) -> str:
    latest_observation = observation.split("[GAME]")[-1].strip()
    if "Feedback:" in latest_observation:
        return latest_observation.split("Feedback:")[-1]
    else:
        return latest_observation

Example feedback:

Guess: CRANE
Feedback: [Y][G][_][_][Y]
Remaining guesses: 5

Legend:

G = Green (correct letter, correct position)
Y = Yellow (correct letter, wrong position)
_ = Gray (letter not in word)

4. Multi-Part Reward Function

Three reward components encourage different behaviors:

Correctness
Efficiency Bonus
Partial Credit
Format Compliance

def correct_answer(parser, completion, answer, **kwargs) -> float:
    """Whether the guess is *exactly* correct."""
    guess = parser.parse_answer(completion)
    return 1.0 if guess == "[" + answer + "]" else 0.0

Returns: 1.0 for correct word, 0.0 otherwise

def length_bonus(parser, completion, answer, **kwargs) -> float:
    """Bonus for shorter correct solutions."""
    assistant_messages = parser.get_assistant_messages(completion)
    guesses = [
        x for x in assistant_messages 
        if re.search(r"<guess>.*</guess>", x["content"])
    ]
    is_correct = correct_answer(parser, completion, answer, **kwargs)
    return is_correct / (len(guesses) or 1)

Returns:

1st guess: 1.0 bonus
2nd guess: 0.5 bonus
3rd guess: 0.33 bonus
Wrong: 0.0

def partial_answer(parser, completion, answer, **kwargs) -> float:
    """Partial credit for the latest guess."""
    if correct_answer(parser, completion, answer, **kwargs):
        return 0.0  # Already got full credit
    
    # Find latest feedback
    user_messages = parser.get_user_messages(completion)
    for user_message in user_messages[::-1]:
        feedback = user_message["content"].strip()
        feedback_parts = feedback.split("\n")
        if len(feedback_parts) == 3:
            _, scoring, _ = feedback_parts
            scoring = scoring.strip()
            num_greens = scoring.count("G")
            num_yellows = scoring.count("Y")
            return 0.2 * num_greens + 0.1 * num_yellows
    return 0.0

Returns:

0.2 per green letter
0.1 per yellow letter
Max: 1.0 (all green = correct word)

format_reward = parser.get_format_reward_func()
rubric.add_reward_func(format_reward, weight=0.2)

Returns: 1.0 if output contains <guess>...</guess>, 0.0 otherwiseWeighted at 0.2 to encourage proper formatting without dominating the reward.

Combined Reward

rubric = vf.Rubric(parser=parser)
rubric.add_reward_func(correct_answer, weight=1.0)      # Main reward
rubric.add_reward_func(partial_answer, weight=1.0)      # Progress signal
rubric.add_reward_func(length_bonus, weight=1.0)        # Efficiency
rubric.add_reward_func(format_reward, weight=0.2)       # Format compliance

Total reward = weighted sum of all components

Example Game Trajectory

Successful Game
Failed Game
Format Error

Target word: STEAK

Turn 1Assistant: I’ll start with a common word to test vowels and common consonants. CRANEEnvironment:

Guess: CRANE
Feedback: [_][_][Y][_][Y]
Remaining guesses: 5

Turn 2Assistant: A is in position 3 (wrong spot), E is in position 5 (wrong spot). Let me try: STEALEnvironment:

Guess: STEAL
Feedback: [G][G][G][G][_]
Remaining guesses: 4

Turn 3Assistant: Almost there! The first four letters are S-T-E-A. Last letter isn’t L. Let me try: STEAKEnvironment:

Guess: STEAK
Feedback: [G][G][G][G][G]
You won! The word was STEAK.

Rewards:

correct_answer: 1.0
length_bonus: 1.0 / 3 = 0.33
partial_answer: 0.0 (already correct)
format_reward: 1.0
Total: 2.53

Target word: QUIRK

Turn 1-6: Model makes guesses but never finds QUIRKTurn 7Environment:

Game over! You've used all 6 guesses.
The word was QUIRK.

Rewards (from last guess with feedback [_][Y][_][G][_]):

correct_answer: 0.0
length_bonus: 0.0 (not correct)
partial_answer: 0.2 * 1 + 0.1 * 1 = 0.3
format_reward: 1.0
Total: 0.5

Turn 1Assistant: I’ll try HOUSEEnvironment: (No valid guess extracted)Rewards:

correct_answer: 0.0
length_bonus: 0.0
partial_answer: 0.0
format_reward: 0.0 (missing guess tags)
Total: 0.0

The model failed to use the required format and received no reward.

Running the Environment

Installation

# Install with TextArena integration
prime env install wordle

Quick Evaluation

# Evaluate on 20 games
prime eval run wordle \
  -m openai/gpt-4.1-mini \
  -b https://api.openai.com/v1 \
  -k OPENAI_API_KEY \
  -n 20 \
  -r 5

Training Dataset

# Generate training data with 1000 games
prime eval run wordle \
  -m openai/gpt-4.1-mini \
  -a '{"num_train_examples": 1000, "num_eval_examples": 50}' \
  --split train \
  -n 1000 \
  -r 8

Configuration Options

Parameter	Default	Description
`num_train_examples`	`2000`	Number of training games
`num_eval_examples`	`20`	Number of evaluation games
`system_prompt`	`DEFAULT_SYSTEM_PROMPT`	Instructions for the model
`seed`	`0`	Random seed for word generation

Key Features

Structured Output with XMLParser

XMLParser provides:

Extraction: Pulls content from XML tags
Validation: Checks format compliance
Format rewards: Built-in reward function for proper formatting

parser = vf.XMLParser(
    fields=["guess"],           # Fields to extract
    answer_field="guess"        # Which field is the answer
)

Multi-Component Rewards

Combining multiple reward signals:

Sparse signal (correct_answer): Only 1.0 when winning
Dense signal (partial_answer): Credit for progress
Efficiency (length_bonus): Reward faster solutions
Compliance (format_reward): Enforce output format

This creates a rich learning signal for RL training.

Game State Tracking

TextArenaEnv automatically tracks:

Number of guesses made
Guess history
Remaining attempts
Win/loss status

Metrics Tracked

correct_answer: 1.0 if word guessed correctly
length_bonus: Efficiency bonus (0.0 to 1.0)
partial_answer: Progress score (0.0 to 1.0)
format_reward: Format compliance (0.0 or 1.0)
reward: Combined weighted sum
num_turns: Number of guesses made

Advanced Usage

Custom Reward Weights

Adjust the importance of different reward components:

rubric = vf.Rubric(parser=parser)
# Emphasize correctness over efficiency
rubric.add_reward_func(correct_answer, weight=5.0)     # 5x weight
rubric.add_reward_func(partial_answer, weight=2.0)     # 2x weight
rubric.add_reward_func(length_bonus, weight=0.5)       # 0.5x weight
rubric.add_reward_func(format_reward, weight=0.2)      # 0.2x weight

Different Wordle Variants

TextArena supports multiple Wordle variants:

from verifiers.envs.integrations.textarena_env import TextArenaEnv

# 6-letter Wordle
env_6 = TextArenaEnv(game="Wordle-6-v0", ...)

# 7-letter Wordle
env_7 = TextArenaEnv(game="Wordle-7-v0", ...)

Other TextArena Games

The same pattern works for other TextArena games:

# Hangman
env = TextArenaEnv(
    game="Hangman-v0",
    parser=vf.XMLParser(fields=["guess"]),
    ...
)

# 20 Questions
env = TextArenaEnv(
    game="TwentyQuestions-v0",
    parser=vf.XMLParser(fields=["question"]),
    ...
)

GSM8K - Single-turn reasoning
Math Python - Multi-turn with code execution
Wiki Search - Multi-turn with custom tools

Next Steps

Learn about MultiTurnEnv for game environments
See Parsers for structured output extraction
Explore Rubrics for custom reward design
Check out TextArena for more games

Example Environments

Wordle Game Environment

Overview

Complete Implementation

How It Works

1. TextArena Integration

2. Structured Output Parsing

3. Feedback Processing

4. Multi-Part Reward Function

Combined Reward

Example Game Trajectory

Running the Environment

Installation

Quick Evaluation

Training Dataset

Configuration Options

Key Features

Structured Output with XMLParser

Multi-Component Rewards

Game State Tracking

Metrics Tracked

Advanced Usage

Custom Reward Weights

Different Wordle Variants

Other TextArena Games

Next Steps

Build docs developers (and LLMs) love

Example Environments

​Overview

​Complete Implementation

​How It Works

​1. TextArena Integration

​2. Structured Output Parsing

​3. Feedback Processing

​4. Multi-Part Reward Function

​Combined Reward

​Example Game Trajectory

​Running the Environment

​Installation

​Quick Evaluation

​Training Dataset

​Configuration Options

​Key Features

​Structured Output with XMLParser

​Multi-Component Rewards

​Game State Tracking

​Metrics Tracked

​Advanced Usage

​Custom Reward Weights

​Different Wordle Variants

​Other TextArena Games

​Related Examples

​Next Steps

Build docs developers (and LLMs) love

Overview

Complete Implementation

How It Works

1. TextArena Integration

2. Structured Output Parsing

3. Feedback Processing

4. Multi-Part Reward Function

Combined Reward

Example Game Trajectory

Running the Environment

Installation

Quick Evaluation

Training Dataset

Configuration Options

Key Features

Structured Output with XMLParser

Multi-Component Rewards

Game State Tracking

Metrics Tracked

Advanced Usage

Custom Reward Weights

Different Wordle Variants

Other TextArena Games

Related Examples

Next Steps