Overview
TheRubric class is the foundation for evaluating LLM responses in Verifiers environments. It manages reward functions and their weights, supports both individual and group-level scoring, and integrates with parsers to extract answers from completions.
Constructor
List of reward functions to evaluate. Can be individual-level (
RewardFunc) or group-level (GroupRewardFunc) functions.Weights for each reward function. Must match the length of
funcs. Defaults to 1.0 for each function if not provided.Parser instance for extracting answers from completions. Defaults to
vf.Parser() if not provided.Reward Function Signatures
Individual-level RewardFunc
Reward functions that score single rollouts can accept any combination of:prompt:list[dict[str, str]] | str- The input promptcompletion:list[dict[str, str]] | str- The model’s completionanswer:Any- Ground truth or metadata for scoringtask:str- Task type identifierstate:State- Full state dictionaryinfo:dict- Additional metadata**kwargs- Additional keyword arguments
float
Group-level GroupRewardFunc
Reward functions that score multiple rollouts together accept plural parameters:prompts:list[...]- List of promptscompletions:list[...]- List of completionsanswers:list[...]- List of answerstasks:list[str]- List of task typesstates:list[State]- List of statesinfos:list[dict]- List of metadata
list[float]
Methods
add_reward_func
The reward function to add.
Weight for this function in the total reward calculation.
add_metric
The metric function to add.
Weight for this function (typically 0 for metrics).
add_class_object
The parameter name that reward functions can use to access this object.
The object to make available to reward functions.
score_rollout
state["reward"] and state["metrics"] in place.
The state dictionary to score. Must contain
prompt, completion, and other required fields.This method requires at least one individual-level reward function and no group-level functions.
score_group
reward, advantage, and metrics fields.
List of state dictionaries to score together.
Group-level functions see all states at once and can implement comparative scoring strategies.
Attributes
List of registered reward functions.
Weights corresponding to each function.
Parser instance for extracting answers.
Dictionary of objects available to reward functions, including the parser.
Example Usage
Group Scoring Example
See Also
- JudgeRubric - LLM-as-judge scoring
- MathRubric - Mathematical equivalence checking
- Parser - Base parser class