Skip to main content

Overview

The Rubric class is the foundation for evaluating LLM responses in Verifiers environments. It manages reward functions and their weights, supports both individual and group-level scoring, and integrates with parsers to extract answers from completions.

Constructor

Rubric(
    funcs: list[RewardFunc | GroupRewardFunc] | None = None,
    weights: list[float] | None = None,
    parser: vf.Parser | None = None,
)
funcs
list[RewardFunc | GroupRewardFunc] | None
default:"None"
List of reward functions to evaluate. Can be individual-level (RewardFunc) or group-level (GroupRewardFunc) functions.
weights
list[float] | None
default:"None"
Weights for each reward function. Must match the length of funcs. Defaults to 1.0 for each function if not provided.
parser
vf.Parser | None
default:"None"
Parser instance for extracting answers from completions. Defaults to vf.Parser() if not provided.

Reward Function Signatures

Individual-level RewardFunc

Reward functions that score single rollouts can accept any combination of:
  • prompt: list[dict[str, str]] | str - The input prompt
  • completion: list[dict[str, str]] | str - The model’s completion
  • answer: Any - Ground truth or metadata for scoring
  • task: str - Task type identifier
  • state: State - Full state dictionary
  • info: dict - Additional metadata
  • **kwargs - Additional keyword arguments
Returns: float

Group-level GroupRewardFunc

Reward functions that score multiple rollouts together accept plural parameters:
  • prompts: list[...] - List of prompts
  • completions: list[...] - List of completions
  • answers: list[...] - List of answers
  • tasks: list[str] - List of task types
  • states: list[State] - List of states
  • infos: list[dict] - List of metadata
Returns: list[float]

Methods

add_reward_func

def add_reward_func(self, func: RewardFunc, weight: float = 1.0)
Add a reward function that contributes to the total reward.
func
RewardFunc
The reward function to add.
weight
float
default:"1.0"
Weight for this function in the total reward calculation.

add_metric

def add_metric(self, func: RewardFunc, weight: float = 0.0)
Add a metric function that is tracked but doesn’t contribute to reward (weight = 0).
func
RewardFunc
The metric function to add.
weight
float
default:"0.0"
Weight for this function (typically 0 for metrics).

add_class_object

def add_class_object(self, name: str, obj: Any)
Register a class object that will be passed to reward functions as a keyword argument.
name
str
The parameter name that reward functions can use to access this object.
obj
Any
The object to make available to reward functions.

score_rollout

async def score_rollout(self, state: State)
Evaluate all individual-level reward functions for a single rollout. Updates state["reward"] and state["metrics"] in place.
state
State
The state dictionary to score. Must contain prompt, completion, and other required fields.
This method requires at least one individual-level reward function and no group-level functions.

score_group

async def score_group(self, states: list[State])
Score multiple rollouts together. Executes all reward functions (both individual and group-level) and updates each state’s reward, advantage, and metrics fields.
states
list[State]
List of state dictionaries to score together.
Group-level functions see all states at once and can implement comparative scoring strategies.

Attributes

funcs
list[RewardFunc | GroupRewardFunc]
List of registered reward functions.
weights
list[float]
Weights corresponding to each function.
parser
vf.Parser
Parser instance for extracting answers.
class_objects
dict[str, Any]
Dictionary of objects available to reward functions, including the parser.

Example Usage

import verifiers as vf

# Define custom reward functions
def length_reward(completion, **kwargs):
    """Reward longer responses."""
    text = completion if isinstance(completion, str) else completion[-1]["content"]
    return min(len(text) / 1000, 1.0)

def correctness_reward(completion, answer, parser, **kwargs):
    """Check if parsed answer matches expected."""
    parsed = parser.parse_answer(completion)
    return 1.0 if parsed == answer else 0.0

# Create rubric with weighted functions
rubric = vf.Rubric(
    funcs=[correctness_reward, length_reward],
    weights=[1.0, 0.1],  # Correctness weighted 10x more than length
    parser=vf.Parser()
)

# Add a metric that doesn't affect reward
rubric.add_metric(lambda completion, **kw: len(completion), weight=0.0)

# Score a state
state = {
    "prompt": "What is 2+2?",
    "completion": [{"role": "assistant", "content": "4"}],
    "answer": "4",
    "task": "math",
    "timing": {"scoring_ms": 0, "total_ms": 0}
}

await rubric.score_rollout(state)
print(f"Reward: {state['reward']}")  # 1.0 * 1.0 + 0.001 * 0.1 = 1.0001
print(f"Metrics: {state['metrics']}")  # Individual scores

Group Scoring Example

def relative_quality(completions, **kwargs):
    """Group function: reward top 50% of responses."""
    lengths = [len(c[-1]["content"]) for c in completions]
    median = sorted(lengths)[len(lengths) // 2]
    return [1.0 if l >= median else 0.0 for l in lengths]

rubric = vf.Rubric(
    funcs=[relative_quality],
    weights=[1.0]
)

# Score multiple states together
states = [create_state(i) for i in range(10)]
await rubric.score_group(states)

# Each state now has reward, advantage, and metrics
for state in states:
    print(f"Reward: {state['reward']}, Advantage: {state['advantage']}")

See Also

Build docs developers (and LLMs) love