Skip to main content

Overview

The Parser class provides the foundation for extracting answers from LLM completions. It handles both string and message-based completions and can be customized with extraction functions.

Constructor

Parser(extract_fn: Callable[[str], str] = lambda x: x)
extract_fn
Callable[[str], str]
default:"lambda x: x"
Function to extract or transform text. Applied to content after extraction from messages.

Methods

parse

def parse(self, text: str) -> Any
Parse text using the configured extraction function.
text
str
The text to parse.
Returns: Result of applying extract_fn to the text.

parse_answer

def parse_answer(self, completion: Messages) -> str | None
Extract the answer from a completion.
completion
Messages
Either a string or list of message dictionaries.
Returns: str | None - Parsed answer from the last assistant message, or None if no assistant messages exist.
For string completions, applies parse() directly. For message lists, extracts the last assistant message’s content and then parses it.

Helper Methods

These methods help extract specific message types from completions:

get_assistant_messages

def get_assistant_messages(self, completion: Messages) -> Messages
Extract all assistant messages from a completion. Returns: List of messages with role="assistant".

get_system_messages

def get_system_messages(self, completion: Messages) -> Messages
Extract all system messages from a completion. Returns: List of messages with role="system".

get_user_messages

def get_user_messages(self, completion: Messages) -> Messages
Extract all user messages from a completion. Returns: List of messages with role="user".

get_tool_messages

def get_tool_messages(self, completion: Messages) -> Messages
Extract all tool messages from a completion. Returns: List of messages with role="tool".

get_format_reward_func

def get_format_reward_func(self) -> Callable
Return a reward function that validates format compliance. Returns: A reward function that returns 1.0 for any completion (base implementation always validates).
Subclasses like XMLParser override this to provide actual format checking.

Attributes

extract_fn
Callable[[str], str]
The extraction function applied during parsing.
logger
logging.Logger
Logger instance for the parser.

Example Usage

Basic String Parsing

import verifiers as vf

# Default parser (identity function)
parser = vf.Parser()
result = parser.parse("Hello world")
print(result)  # "Hello world"

Custom Extraction Function

import re

def extract_number(text: str) -> str:
    """Extract the first number from text."""
    match = re.search(r'\d+', text)
    return match.group(0) if match else ""

parser = vf.Parser(extract_fn=extract_number)
result = parser.parse("The answer is 42 units")
print(result)  # "42"

Extracting Boxed Answers

from verifiers.utils.data_utils import extract_boxed_answer

# Parser for LaTeX \boxed{} format
parser = vf.Parser(extract_fn=extract_boxed_answer)

completion = "Therefore, the solution is \\boxed{x = 5}"
result = parser.parse(completion)
print(result)  # "x = 5"

Parsing Message Completions

parser = vf.Parser()

# Parse from message list
completion = [
    {"role": "user", "content": "What is 2+2?"},
    {"role": "assistant", "content": "The answer is 4"},
    {"role": "user", "content": "Are you sure?"},
    {"role": "assistant", "content": "Yes, 2+2=4"}
]

answer = parser.parse_answer(completion)
print(answer)  # "Yes, 2+2=4" (last assistant message)

Extracting Specific Message Types

parser = vf.Parser()

completion = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello"},
    {"role": "assistant", "content": "Hi there!"},
    {"role": "tool", "name": "search", "content": "..."},
    {"role": "assistant", "content": "Based on the search..."},
]

user_msgs = parser.get_user_messages(completion)
print(len(user_msgs))  # 1

assistant_msgs = parser.get_assistant_messages(completion)
print(len(assistant_msgs))  # 2

tool_msgs = parser.get_tool_messages(completion)
print(len(tool_msgs))  # 1

Multimodal Content

parser = vf.Parser()

# Handle mixed text and image content
completion = [
    {
        "role": "assistant",
        "content": [
            {"type": "text", "text": "Here's an image:"},
            {"type": "image_url", "image_url": {"url": "..."}},
            {"type": "text", "text": "and more text"}
        ]
    }
]

answer = parser.parse_answer(completion)
print(answer)  # "Here's an image: and more text" (text parts joined)

Using in a Reward Function

def length_reward(completion, parser, **kwargs):
    """Reward based on parsed answer length."""
    answer = parser.parse_answer(completion)
    if answer is None:
        return 0.0
    return min(len(answer) / 100, 1.0)

rubric = vf.Rubric(
    funcs=[length_reward],
    weights=[1.0],
    parser=vf.Parser()
)

Custom Parser Subclass

class JSONParser(vf.Parser):
    """Parser that extracts JSON from code blocks."""
    
    def __init__(self):
        import json
        super().__init__(extract_fn=self._extract_json)
        self.json = json
    
    def _extract_json(self, text: str) -> str:
        # Extract from ```json ... ``` blocks
        match = re.search(r'```json\s*\n(.*?)\n```', text, re.DOTALL)
        if match:
            return match.group(1)
        return text
    
    def parse(self, text: str) -> dict:
        json_str = super().parse(text)
        try:
            return self.json.loads(json_str)
        except:
            return {}

# Use custom parser
parser = JSONParser()
text = "Here's the data:\n```json\n{\"key\": \"value\"}\n```"
result = parser.parse(text)
print(result)  # {"key": "value"}

Message Format Compatibility

The parser handles both dictionary and object message formats:
parser = vf.Parser()

# Dictionary format
msg_dict = {"role": "assistant", "content": "Hello"}
parser.parse_answer([msg_dict])  # "Hello"

# Object format (e.g., OpenAI SDK objects)
class Message:
    def __init__(self, role, content):
        self.role = role
        self.content = content

msg_obj = Message("assistant", "Hello")
parser.parse_answer([msg_obj])  # "Hello"

Edge Cases

parser = vf.Parser()

# No assistant messages
parser.parse_answer([{"role": "user", "content": "Hi"}])  # None

# Empty content
parser.parse_answer([{"role": "assistant", "content": ""}])  # ""

# None content
parser.parse_answer([{"role": "assistant", "content": None}])  # ""

# String completion
parser.parse_answer("direct string")  # "direct string"

See Also

Build docs developers (and LLMs) love