evaluate()
Theevaluate() function is the main entry point for running evaluations.
Import
Signature
Options
EvaluateOptions<T>
Evaluate a single model:
name
Human-readable name for the evaluation run.
model
Model identifier (passed to reporters and stored in the database).
dataset
Dataset of input/expected pairs. See Datasets.
task
Function that calls your model and returns the output.
scorers
Named scoring functions. See Scorers.
reporters
Reporters that receive lifecycle events and produce output.
store
Persistent store for run history.
suiteId (optional)
Associate this run with an existing suite ID.
name.
maxConcurrency (optional)
Maximum number of cases to run concurrently.
timeout (optional)
Per-case timeout in milliseconds.
trials (optional)
Number of times to run each case and average the scores.
threshold (optional)
Minimum average score (0–1) required for a case to pass.
EvaluateEachOptions<T, V>
Evaluate multiple model variants:
models
Array of model variants. Each variant must have a name property:
task
Task function receives both the input and the current variant:
Return Value
Theevaluate() function returns an EvalBuilder that implements PromiseLike, so you can await it directly:
failed()
Run only cases that failed in the previous run:
cases(spec)
Run specific cases by index:
0-10— Range from 0 to 10 (inclusive)5— Single index0-10,15,20-25— Multiple ranges and indexes
sample(n)
Run a random sample of n cases:
assert()
Throw EvalAssertionError if any cases fail:
Example: Single Model
Example: Multiple Models
Example: Builder Pattern
Types
RunSummary
EvalAssertionError
Next Steps
Datasets
Learn about dataset loading
Scorers
Explore scorer functions
Engine API
Lower-level engine API