Engine API
The engine module orchestrates dataset iteration, task execution, scoring, and persistence.Import
runEval(config)
Lower-level function to run an evaluation with full control.
Signature
Parameters
name
Human-readable name for the evaluation run.
model
Model identifier.
dataset
Dataset to evaluate.
task
Task function that calls your model.
scorers
Named scoring functions.
store
Run store for persistence.
emitter (optional)
Event emitter for lifecycle events.
suiteId (optional)
Associate with an existing suite.
config (optional)
Arbitrary configuration metadata to store with the run.
maxConcurrency (optional)
Maximum concurrent cases.
batchSize (optional)
Process cases in batches. Waits for each batch to complete before starting the next.
maxConcurrency).
timeout (optional)
Per-case timeout in milliseconds.
trials (optional)
Run each case multiple times and average scores.
threshold (optional)
Minimum score to count as pass.
Return Value
Returns aRunSummary:
EvalEmitter
Event emitter for evaluation lifecycle events.
Constructor
Events
run:start
Emitted when the run begins.
case:start
Emitted when a case starts executing.
case:scored
Emitted when a case is scored (always fires, even on error).
case:error
Emitted when a case throws an error.
run:end
Emitted when the run completes.
Event Type
All events are typed:Examples
Basic Usage
Event Listeners
Concurrency Control
Batch Processing
Trials
Next Steps
Evaluate API
Higher-level evaluate() function
Reporters
Use reporters instead of raw events