Rubric

Overview

The Rubric class is the foundation for evaluating LLM responses in Verifiers environments. It manages reward functions and their weights, supports both individual and group-level scoring, and integrates with parsers to extract answers from completions.

Constructor

Rubric(
    funcs: list[RewardFunc | GroupRewardFunc] | None = None,
    weights: list[float] | None = None,
    parser: vf.Parser | None = None,
)

funcs

list[RewardFunc | GroupRewardFunc] | None

default:"None"

List of reward functions to evaluate. Can be individual-level (RewardFunc) or group-level (GroupRewardFunc) functions.

weights

list[float] | None

default:"None"

Weights for each reward function. Must match the length of funcs. Defaults to 1.0 for each function if not provided.

parser

vf.Parser | None

default:"None"

Parser instance for extracting answers from completions. Defaults to vf.Parser() if not provided.

Reward Function Signatures

Individual-level RewardFunc

Reward functions that score single rollouts can accept any combination of:

prompt: list[dict[str, str]] | str - The input prompt
completion: list[dict[str, str]] | str - The model’s completion
answer: Any - Ground truth or metadata for scoring
task: str - Task type identifier
state: State - Full state dictionary
info: dict - Additional metadata
**kwargs - Additional keyword arguments

Returns: float

Group-level GroupRewardFunc

Reward functions that score multiple rollouts together accept plural parameters:

prompts: list[...] - List of prompts
completions: list[...] - List of completions
answers: list[...] - List of answers
tasks: list[str] - List of task types
states: list[State] - List of states
infos: list[dict] - List of metadata

Returns: list[float]

Methods

add_reward_func

def add_reward_func(self, func: RewardFunc, weight: float = 1.0)

Add a reward function that contributes to the total reward.

func

RewardFunc

The reward function to add.

weight

float

default:"1.0"

Weight for this function in the total reward calculation.

add_metric

def add_metric(self, func: RewardFunc, weight: float = 0.0)

Add a metric function that is tracked but doesn’t contribute to reward (weight = 0).

func

RewardFunc

The metric function to add.

weight

float

default:"0.0"

Weight for this function (typically 0 for metrics).

add_class_object

def add_class_object(self, name: str, obj: Any)

name

str

The parameter name that reward functions can use to access this object.

obj

Any

The object to make available to reward functions.

score_rollout

async def score_rollout(self, state: State)

Evaluate all individual-level reward functions for a single rollout. Updates state["reward"] and state["metrics"] in place.

state

State

The state dictionary to score. Must contain prompt, completion, and other required fields.

This method requires at least one individual-level reward function and no group-level functions.

score_group

async def score_group(self, states: list[State])

Score multiple rollouts together. Executes all reward functions (both individual and group-level) and updates each state’s reward, advantage, and metrics fields.

states

list[State]

List of state dictionaries to score together.

Group-level functions see all states at once and can implement comparative scoring strategies.

Attributes

funcs

list[RewardFunc | GroupRewardFunc]

List of registered reward functions.

weights

list[float]

Weights corresponding to each function.

parser

vf.Parser

Parser instance for extracting answers.

class_objects

dict[str, Any]

Dictionary of objects available to reward functions, including the parser.

Example Usage

import verifiers as vf

# Define custom reward functions
def length_reward(completion, **kwargs):
    """Reward longer responses."""
    text = completion if isinstance(completion, str) else completion[-1]["content"]
    return min(len(text) / 1000, 1.0)

def correctness_reward(completion, answer, parser, **kwargs):
    """Check if parsed answer matches expected."""
    parsed = parser.parse_answer(completion)
    return 1.0 if parsed == answer else 0.0

# Create rubric with weighted functions
rubric = vf.Rubric(
    funcs=[correctness_reward, length_reward],
    weights=[1.0, 0.1],  # Correctness weighted 10x more than length
    parser=vf.Parser()
)

# Add a metric that doesn't affect reward
rubric.add_metric(lambda completion, **kw: len(completion), weight=0.0)

# Score a state
state = {
    "prompt": "What is 2+2?",
    "completion": [{"role": "assistant", "content": "4"}],
    "answer": "4",
    "task": "math",
    "timing": {"scoring_ms": 0, "total_ms": 0}
}

await rubric.score_rollout(state)
print(f"Reward: {state['reward']}")  # 1.0 * 1.0 + 0.001 * 0.1 = 1.0001
print(f"Metrics: {state['metrics']}")  # Individual scores

Group Scoring Example

def relative_quality(completions, **kwargs):
    """Group function: reward top 50% of responses."""
    lengths = [len(c[-1]["content"]) for c in completions]
    median = sorted(lengths)[len(lengths) // 2]
    return [1.0 if l >= median else 0.0 for l in lengths]

rubric = vf.Rubric(
    funcs=[relative_quality],
    weights=[1.0]
)

# Score multiple states together
states = [create_state(i) for i in range(10)]
await rubric.score_group(states)

# Each state now has reward, advantage, and metrics
for state in states:
    print(f"Reward: {state['reward']}, Advantage: {state['advantage']}")

Environment Classes

Rubrics & Parsers

Clients

Integration Classes

Experimental

Data Types

Utilities

Overview

Constructor

Reward Function Signatures

Individual-level RewardFunc

Group-level GroupRewardFunc

Methods

add_reward_func

add_metric

add_class_object

score_rollout

score_group

Attributes

Example Usage

Group Scoring Example

See Also

Build docs developers (and LLMs) love

Environment Classes

Rubrics & Parsers

Clients

Integration Classes

Experimental

Data Types

Utilities

​Overview

​Constructor

​Reward Function Signatures

​Individual-level RewardFunc

​Group-level GroupRewardFunc

​Methods

​add_reward_func

​add_metric

​add_class_object

​score_rollout

​score_group

​Attributes

​Example Usage

​Group Scoring Example

​See Also

Build docs developers (and LLMs) love

Overview

Constructor

Reward Function Signatures

Individual-level RewardFunc

Group-level GroupRewardFunc

Methods

add_reward_func

add_metric

add_class_object

score_rollout

score_group

Attributes

Example Usage

Group Scoring Example

See Also