Skip to main content
This example demonstrates a multi-turn game environment where models play Wordle by making guesses and receiving feedback. It showcases parsing structured output, custom reward functions, and integration with TextArena.

Overview

The Wordle environment provides:
  • Game: 5-letter word guessing with color-coded feedback
  • Format: Multi-turn interaction (up to 6 guesses)
  • Parsing: XML tags for structured guess extraction
  • Rewards: Correctness, efficiency bonus, and partial credit
  • Integration: TextArena game library

Complete Implementation

Here’s the full working implementation from environments/wordle/wordle.py:
import re

import verifiers as vf
from verifiers.envs.integrations.textarena_env import TextArenaEnv

DEFAULT_SYSTEM_PROMPT = """You are a competitive game player. \
Make sure you read the game instructions carefully, and always follow the required format.

In each turn, think step-by-step, then give your guess inside <guess>...</guess> tags."""


### feedback functions
def wordle_feedback_fn(observation: str) -> str:
    latest_observation = observation.split("[GAME]")[-1].strip()
    if "Feedback:" in latest_observation:
        return latest_observation.split("Feedback:")[-1]
    else:
        return latest_observation


### reward functions
def correct_answer(parser, completion, answer, **kwargs) -> float:
    """Whether the guess is *exactly* correct."""
    guess = parser.parse_answer(completion)
    return 1.0 if guess == "[" + answer + "]" else 0.0


def length_bonus(parser, completion, answer, **kwargs) -> float:
    """Bonus for shorter correct solutions."""
    assistant_messages = parser.get_assistant_messages(completion)
    guesses = [
        x for x in assistant_messages if re.search(r"<guess>.*</guess>", x["content"])
    ]
    is_correct = correct_answer(parser, completion, answer, **kwargs)
    return is_correct / (len(guesses) or 1)


def partial_answer(parser, completion, answer, **kwargs) -> float:
    """Partial credit for the latest guess."""
    if correct_answer(parser, completion, answer, **kwargs):
        return 0.0
    user_messages = parser.get_user_messages(completion)
    for user_message in user_messages[::-1]:
        feedback = user_message["content"].strip()
        feedback_parts = feedback.split("\n")
        if len(feedback_parts) == 3:
            _, scoring, _ = feedback_parts
            scoring = scoring.strip()
            num_greens = scoring.count("G")
            num_yellows = scoring.count("Y")
            return 0.2 * num_greens + 0.1 * num_yellows
    return 0.0


### environment loader
def load_environment(
    num_train_examples: int = 2000,
    num_eval_examples: int = 20,
    system_prompt: str = DEFAULT_SYSTEM_PROMPT,
    seed: int = 0,
    **kwargs,
):
    parser = vf.XMLParser(fields=["guess"], answer_field="guess")

    rubric = vf.Rubric(parser=parser)
    rubric.add_reward_func(correct_answer)
    rubric.add_reward_func(partial_answer)
    rubric.add_reward_func(length_bonus)
    format_reward = parser.get_format_reward_func()
    format_reward.__name__ = "format_reward"
    rubric.add_reward_func(format_reward, weight=0.2)

    return TextArenaEnv(
        game="Wordle-v0",
        num_train_examples=num_train_examples,
        num_eval_examples=num_eval_examples,
        feedback_fn=wordle_feedback_fn,
        seed=seed,
        system_prompt=system_prompt,
        parser=parser,
        rubric=rubric,
        **kwargs,
    )

How It Works

1. TextArena Integration

TextArenaEnv wraps TextArena games for RL training:
return TextArenaEnv(
    game="Wordle-v0",
    num_train_examples=2000,
    num_eval_examples=20,
    feedback_fn=wordle_feedback_fn,
)
TextArena provides text-based game environments. The Wordle game:
  • Generates random 5-letter target words
  • Accepts guesses and returns color-coded feedback
  • Tracks game state (remaining guesses, history)

2. Structured Output Parsing

XMLParser extracts guesses from model responses:
parser = vf.XMLParser(fields=["guess"], answer_field="guess")
Example parsing:
model_output = "I'll try the word CRANE. <guess>CRANE</guess>"
parser.parse_answer(model_output)  # Returns: "[CRANE]"

3. Feedback Processing

The wordle_feedback_fn extracts game feedback:
def wordle_feedback_fn(observation: str) -> str:
    latest_observation = observation.split("[GAME]")[-1].strip()
    if "Feedback:" in latest_observation:
        return latest_observation.split("Feedback:")[-1]
    else:
        return latest_observation
Example feedback:
Guess: CRANE
Feedback: [Y][G][_][_][Y]
Remaining guesses: 5
Legend:
  • G = Green (correct letter, correct position)
  • Y = Yellow (correct letter, wrong position)
  • _ = Gray (letter not in word)

4. Multi-Part Reward Function

Three reward components encourage different behaviors:
def correct_answer(parser, completion, answer, **kwargs) -> float:
    """Whether the guess is *exactly* correct."""
    guess = parser.parse_answer(completion)
    return 1.0 if guess == "[" + answer + "]" else 0.0
Returns: 1.0 for correct word, 0.0 otherwise

Combined Reward

rubric = vf.Rubric(parser=parser)
rubric.add_reward_func(correct_answer, weight=1.0)      # Main reward
rubric.add_reward_func(partial_answer, weight=1.0)      # Progress signal
rubric.add_reward_func(length_bonus, weight=1.0)        # Efficiency
rubric.add_reward_func(format_reward, weight=0.2)       # Format compliance
Total reward = weighted sum of all components

Example Game Trajectory

Target word: STEAK
Turn 1Assistant: I’ll start with a common word to test vowels and common consonants. CRANEEnvironment:
Guess: CRANE
Feedback: [_][_][Y][_][Y]
Remaining guesses: 5

Turn 2Assistant: A is in position 3 (wrong spot), E is in position 5 (wrong spot). Let me try: STEALEnvironment:
Guess: STEAL
Feedback: [G][G][G][G][_]
Remaining guesses: 4

Turn 3Assistant: Almost there! The first four letters are S-T-E-A. Last letter isn’t L. Let me try: STEAKEnvironment:
Guess: STEAK
Feedback: [G][G][G][G][G]
You won! The word was STEAK.
Rewards:
  • correct_answer: 1.0
  • length_bonus: 1.0 / 3 = 0.33
  • partial_answer: 0.0 (already correct)
  • format_reward: 1.0
  • Total: 2.53

Running the Environment

Installation

# Install with TextArena integration
prime env install wordle

Quick Evaluation

# Evaluate on 20 games
prime eval run wordle \
  -m openai/gpt-4.1-mini \
  -b https://api.openai.com/v1 \
  -k OPENAI_API_KEY \
  -n 20 \
  -r 5

Training Dataset

# Generate training data with 1000 games
prime eval run wordle \
  -m openai/gpt-4.1-mini \
  -a '{"num_train_examples": 1000, "num_eval_examples": 50}' \
  --split train \
  -n 1000 \
  -r 8

Configuration Options

ParameterDefaultDescription
num_train_examples2000Number of training games
num_eval_examples20Number of evaluation games
system_promptDEFAULT_SYSTEM_PROMPTInstructions for the model
seed0Random seed for word generation

Key Features

Structured Output with XMLParser

XMLParser provides:
  • Extraction: Pulls content from XML tags
  • Validation: Checks format compliance
  • Format rewards: Built-in reward function for proper formatting
parser = vf.XMLParser(
    fields=["guess"],           # Fields to extract
    answer_field="guess"        # Which field is the answer
)

Multi-Component Rewards

Combining multiple reward signals:
  • Sparse signal (correct_answer): Only 1.0 when winning
  • Dense signal (partial_answer): Credit for progress
  • Efficiency (length_bonus): Reward faster solutions
  • Compliance (format_reward): Enforce output format
This creates a rich learning signal for RL training.

Game State Tracking

TextArenaEnv automatically tracks:
  • Number of guesses made
  • Guess history
  • Remaining attempts
  • Win/loss status

Metrics Tracked

  • correct_answer: 1.0 if word guessed correctly
  • length_bonus: Efficiency bonus (0.0 to 1.0)
  • partial_answer: Progress score (0.0 to 1.0)
  • format_reward: Format compliance (0.0 or 1.0)
  • reward: Combined weighted sum
  • num_turns: Number of guesses made

Advanced Usage

Custom Reward Weights

Adjust the importance of different reward components:
rubric = vf.Rubric(parser=parser)
# Emphasize correctness over efficiency
rubric.add_reward_func(correct_answer, weight=5.0)     # 5x weight
rubric.add_reward_func(partial_answer, weight=2.0)     # 2x weight
rubric.add_reward_func(length_bonus, weight=0.5)       # 0.5x weight
rubric.add_reward_func(format_reward, weight=0.2)      # 0.2x weight

Different Wordle Variants

TextArena supports multiple Wordle variants:
from verifiers.envs.integrations.textarena_env import TextArenaEnv

# 6-letter Wordle
env_6 = TextArenaEnv(game="Wordle-6-v0", ...)

# 7-letter Wordle
env_7 = TextArenaEnv(game="Wordle-7-v0", ...)

Other TextArena Games

The same pattern works for other TextArena games:
# Hangman
env = TextArenaEnv(
    game="Hangman-v0",
    parser=vf.XMLParser(fields=["guess"]),
    ...
)

# 20 Questions
env = TextArenaEnv(
    game="TwentyQuestions-v0",
    parser=vf.XMLParser(fields=["question"]),
    ...
)

Next Steps

Build docs developers (and LLMs) love