Overview
The Parser class provides the foundation for extracting answers from LLM completions. It handles both string and message-based completions and can be customized with extraction functions.
Constructor
Parser(extract_fn: Callable[[str], str] = lambda x: x)
Function to extract or transform text. Applied to content after extraction from messages.
Methods
parse
def parse(self, text: str) -> Any
Parse text using the configured extraction function.
Returns: Result of applying extract_fn to the text.
parse_answer
def parse_answer(self, completion: Messages) -> str | None
Extract the answer from a completion.
Either a string or list of message dictionaries.
Returns: str | None - Parsed answer from the last assistant message, or None if no assistant messages exist.
For string completions, applies parse() directly. For message lists, extracts the last assistant message’s content and then parses it.
Helper Methods
These methods help extract specific message types from completions:
get_assistant_messages
def get_assistant_messages(self, completion: Messages) -> Messages
Extract all assistant messages from a completion.
Returns: List of messages with role="assistant".
get_system_messages
def get_system_messages(self, completion: Messages) -> Messages
Extract all system messages from a completion.
Returns: List of messages with role="system".
get_user_messages
def get_user_messages(self, completion: Messages) -> Messages
Extract all user messages from a completion.
Returns: List of messages with role="user".
def get_tool_messages(self, completion: Messages) -> Messages
Extract all tool messages from a completion.
Returns: List of messages with role="tool".
def get_format_reward_func(self) -> Callable
Return a reward function that validates format compliance.
Returns: A reward function that returns 1.0 for any completion (base implementation always validates).
Subclasses like XMLParser override this to provide actual format checking.
Attributes
The extraction function applied during parsing.
Logger instance for the parser.
Example Usage
Basic String Parsing
import verifiers as vf
# Default parser (identity function)
parser = vf.Parser()
result = parser.parse("Hello world")
print(result) # "Hello world"
import re
def extract_number(text: str) -> str:
"""Extract the first number from text."""
match = re.search(r'\d+', text)
return match.group(0) if match else ""
parser = vf.Parser(extract_fn=extract_number)
result = parser.parse("The answer is 42 units")
print(result) # "42"
from verifiers.utils.data_utils import extract_boxed_answer
# Parser for LaTeX \boxed{} format
parser = vf.Parser(extract_fn=extract_boxed_answer)
completion = "Therefore, the solution is \\boxed{x = 5}"
result = parser.parse(completion)
print(result) # "x = 5"
Parsing Message Completions
parser = vf.Parser()
# Parse from message list
completion = [
{"role": "user", "content": "What is 2+2?"},
{"role": "assistant", "content": "The answer is 4"},
{"role": "user", "content": "Are you sure?"},
{"role": "assistant", "content": "Yes, 2+2=4"}
]
answer = parser.parse_answer(completion)
print(answer) # "Yes, 2+2=4" (last assistant message)
parser = vf.Parser()
completion = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi there!"},
{"role": "tool", "name": "search", "content": "..."},
{"role": "assistant", "content": "Based on the search..."},
]
user_msgs = parser.get_user_messages(completion)
print(len(user_msgs)) # 1
assistant_msgs = parser.get_assistant_messages(completion)
print(len(assistant_msgs)) # 2
tool_msgs = parser.get_tool_messages(completion)
print(len(tool_msgs)) # 1
Multimodal Content
parser = vf.Parser()
# Handle mixed text and image content
completion = [
{
"role": "assistant",
"content": [
{"type": "text", "text": "Here's an image:"},
{"type": "image_url", "image_url": {"url": "..."}},
{"type": "text", "text": "and more text"}
]
}
]
answer = parser.parse_answer(completion)
print(answer) # "Here's an image: and more text" (text parts joined)
Using in a Reward Function
def length_reward(completion, parser, **kwargs):
"""Reward based on parsed answer length."""
answer = parser.parse_answer(completion)
if answer is None:
return 0.0
return min(len(answer) / 100, 1.0)
rubric = vf.Rubric(
funcs=[length_reward],
weights=[1.0],
parser=vf.Parser()
)
Custom Parser Subclass
class JSONParser(vf.Parser):
"""Parser that extracts JSON from code blocks."""
def __init__(self):
import json
super().__init__(extract_fn=self._extract_json)
self.json = json
def _extract_json(self, text: str) -> str:
# Extract from ```json ... ``` blocks
match = re.search(r'```json\s*\n(.*?)\n```', text, re.DOTALL)
if match:
return match.group(1)
return text
def parse(self, text: str) -> dict:
json_str = super().parse(text)
try:
return self.json.loads(json_str)
except:
return {}
# Use custom parser
parser = JSONParser()
text = "Here's the data:\n```json\n{\"key\": \"value\"}\n```"
result = parser.parse(text)
print(result) # {"key": "value"}
The parser handles both dictionary and object message formats:
parser = vf.Parser()
# Dictionary format
msg_dict = {"role": "assistant", "content": "Hello"}
parser.parse_answer([msg_dict]) # "Hello"
# Object format (e.g., OpenAI SDK objects)
class Message:
def __init__(self, role, content):
self.role = role
self.content = content
msg_obj = Message("assistant", "Hello")
parser.parse_answer([msg_obj]) # "Hello"
Edge Cases
parser = vf.Parser()
# No assistant messages
parser.parse_answer([{"role": "user", "content": "Hi"}]) # None
# Empty content
parser.parse_answer([{"role": "assistant", "content": ""}]) # ""
# None content
parser.parse_answer([{"role": "assistant", "content": None}]) # ""
# String completion
parser.parse_answer("direct string") # "direct string"
See Also