Skip to main content

Overview

ThinkParser extracts content that appears after </think> tags in model responses. It is designed for models that generate reasoning within <think>...</think> tags but do not automatically remove them from the output.
ThinkParser is intended for use with models which always include <think>...</think> tags but do NOT parse them automatically. This will cause parsing failures if the model does not include <think>...</think> tags, or if the chat template automatically removes <think>...</think> tags. In particular, you should NOT use this parser with Qwen3 or DeepSeek-R1 models.

Class Signature

class ThinkParser(Parser):
    def __init__(self, extract_fn: Callable[[str], str] = lambda x: x)

Parameters

extract_fn
Callable[[str], str]
default:"lambda x: x"
Optional extraction function to further process the parsed text after removing think tags. For example, you can use this to extract boxed answers from math problems.

Methods

parse

Extracts content after the last </think> tag.
def parse(self, text: str) -> str
text
str
required
The text to parse, potentially containing <think>...</think> tags
Returns: The content after the last </think> tag, or an empty string if no </think> tag is found. The result is then passed through extract_fn if provided. Behavior:
  • If </think> is found: Returns everything after the last </think> tag
  • If </think> is NOT found: Returns an empty string (strict enforcement)
  • Whitespace is automatically stripped

get_format_reward_func

Returns a reward function that validates the think tag format in completions.
def get_format_reward_func(self) -> Callable
Returns: A reward function that checks if each assistant message follows the correct format:
  • Must start with <think>
  • Must contain exactly one <think> tag
  • Must contain exactly one </think> tag
  • Must have non-empty content after </think>
The reward function returns the average score across all assistant messages (1.0 for well-formatted, 0.0 for poorly-formatted).

Usage Examples

Basic Usage

import verifiers as vf

# Create parser
parser = vf.ThinkParser()

# Parse text with think tags
text = """<think>
Let me think about this problem.
I need to consider multiple factors.
</think>
The final answer is 42."""

result = parser.parse(text)
print(result)  # "The final answer is 42."

With Custom Extraction Function

import re
import verifiers as vf

def extract_boxed(text: str) -> str:
    """Extract content from \boxed{...}"""
    match = re.search(r'\\boxed\{([^}]+)\}', text)
    return match.group(1) if match else text

# Create parser with extraction
parser = vf.ThinkParser(extract_fn=extract_boxed)

text = """<think>
I need to solve this step by step.
</think>
The answer is \\boxed{42}."""

result = parser.parse(text)
print(result)  # "42"

Using Format Reward Function

import verifiers as vf

parser = vf.ThinkParser()
reward_func = parser.get_format_reward_func()

# Well-formatted completion
completion = [
    {"role": "assistant", "content": "<think>Let me think</think>Final answer"}
]
reward = reward_func(completion)
print(reward)  # 1.0

# Poorly-formatted completion (missing think tags)
bad_completion = [
    {"role": "assistant", "content": "Just an answer without thinking"}
]
reward = reward_func(bad_completion)
print(reward)  # 0.0

In a Rubric

import verifiers as vf

def check_correctness(completion, answer, **kwargs):
    """Check if the parsed answer is correct"""
    return 1.0 if completion == answer else 0.0

parser = vf.ThinkParser()

rubric = vf.Rubric(
    funcs=[check_correctness, parser.get_format_reward_func()],
    weights=[1.0, 0.1],  # Correctness weighted more than format
    parser=parser
)

Multiple Think Blocks

When multiple think blocks are present, only content after the last </think> tag is returned:
import verifiers as vf

parser = vf.ThinkParser()

text = """<think>First thought</think>
Some intermediate text.
<think>Second thought</think>
Final answer here."""

result = parser.parse(text)
print(result)  # "Final answer here."

When to Use ThinkParser

Use ThinkParser when:
  • Your model always generates <think>...</think> tags for reasoning
  • The model does NOT automatically strip these tags from responses
  • You want strict enforcement (fail if tags are missing)
  • You need to validate that responses follow the think tag format
Do NOT use ThinkParser with:
  • Qwen3 models (automatically parse think tags)
  • DeepSeek-R1 models (automatically parse think tags)
  • Models that may or may not include think tags (use MaybeThinkParser instead)
  • Non-reasoning models that never use think tags

See Also

Build docs developers (and LLMs) love