RunStore API

The RunStore class provides SQLite-backed persistence for evaluation runs, cases, and scores.

Import

import { RunStore } from '@deepagents/evals/store';

Constructor

`new RunStore(pathOrDb?)`

Create a new run store.

// Default location: .evals/store.db
const store = new RunStore();

// Custom location
const store = new RunStore('./my-evals/results.db');

// In-memory (for testing)
import { DatabaseSync } from 'node:sqlite';
const db = new DatabaseSync(':memory:');
const store = new RunStore(db);

Parameters:

pathOrDb?: string | DatabaseSync — File path or SQLite database instance

The directory is created automatically if it doesn’t exist.

Suites

`createSuite(name)`

Create a new suite.

const suite = store.createSuite('text2sql-accuracy');
// { id: '...', name: 'text2sql-accuracy', created_at: 1234567890 }

Returns: SuiteRow

interface SuiteRow {
  id: string;
  name: string;
  created_at: number;
}

`getSuite(id)`

Get a suite by ID.

const suite = store.getSuite(suiteId);

Returns: SuiteRow | undefined

`findSuiteByName(name)`

Find a suite by name (returns the most recently created if multiple exist).

const suite = store.findSuiteByName('text2sql-accuracy');

Returns: SuiteRow | undefined

`listSuites()`

List all suites, sorted by creation time (newest first).

const suites = store.listSuites();
for (const suite of suites) {
  console.log(suite.name);
}

Returns: SuiteRow[]

`renameSuite(id, name)`

Rename a suite.

store.renameSuite(suiteId, 'new-name');

Runs

`createRun(run)`

Create a new run.

const runId = store.createRun({
  suite_id: suite.id,
  name: 'my-eval',
  model: 'gpt-4o',
  config: { temperature: 0.7 },
});

Parameters:

{
  suite_id: string;
  name: string;
  model: string;
  config?: Record<string, unknown>;
}

Returns: string (run ID)

`getRun(runId)`

Get a run by ID.

const run = store.getRun(runId);

Returns: RunRow | undefined

interface RunRow {
  id: string;
  suite_id: string;
  name: string;
  model: string;
  config: Record<string, unknown> | null;
  started_at: number;
  finished_at: number | null;
  status: 'running' | 'completed' | 'failed';
  summary: RunSummary | null;
}

`listRuns(suiteId?)`

List all runs or filter by suite.

// All runs
const runs = store.listRuns();

// Runs in a specific suite
const runs = store.listRuns(suiteId);

Returns: RunRow[]

`finishRun(runId, status, summary?)`

Mark a run as completed or failed.

store.finishRun(runId, 'completed', summary);

Parameters:

runId: string
status: 'completed' | 'failed'
summary?: RunSummary

`getLatestCompletedRun(suiteId, model?)`

Get the most recent completed run for a suite.

const run = store.getLatestCompletedRun(suiteId);

// Filter by model
const run = store.getLatestCompletedRun(suiteId, 'gpt-4o');

Returns: RunRow | undefined

`renameRun(id, name)`

Rename a run.

store.renameRun(runId, 'new-name');

Cases

`saveCases(cases)`

Save one or more cases.

store.saveCases([
  {
    id: caseId,
    run_id: runId,
    idx: 0,
    input: { question: 'What is 2+2?' },
    output: '4',
    expected: '4',
    latency_ms: 150,
    tokens_in: 10,
    tokens_out: 2,
  },
]);

Parameters:

interface CaseData {
  id: string;
  run_id: string;
  idx: number;
  input: unknown;
  output: string | null;
  expected?: unknown;
  latency_ms: number;
  tokens_in: number;
  tokens_out: number;
  error?: string;
}

`getCases(runId)`

Get all cases for a run, sorted by index.

const cases = store.getCases(runId);
for (const c of cases) {
  console.log(c.idx, c.output);
}

Returns: CaseRow[]

interface CaseRow {
  id: string;
  run_id: string;
  idx: number;
  input: unknown;
  output: string | null;
  expected: unknown | null;
  latency_ms: number;
  tokens_in: number;
  tokens_out: number;
  error: string | null;
}

`getFailingCases(runId, threshold?)`

Get cases that scored below a threshold.

const failing = store.getFailingCases(runId, 0.5);
for (const c of failing) {
  console.log(`Case #${c.idx}:`, c.scores);
}

Parameters:

runId: string
threshold?: number (default: 0.5)

Returns: CaseWithScores[]

interface CaseWithScores extends CaseRow {
  scores: Array<{ scorer_name: string; score: number; reason: string | null }>;
}

Scores

`saveScores(scores)`

Save one or more scores.

store.saveScores([
  {
    id: scoreId,
    case_id: caseId,
    scorer_name: 'exact',
    score: 1.0,
    reason: undefined,
  },
]);

Parameters:

interface ScoreData {
  id: string;
  case_id: string;
  scorer_name: string;
  score: number;
  reason?: string;
}

Summaries

`getRunSummary(runId, threshold?)`

Compute aggregated statistics for a run.

const summary = store.getRunSummary(runId, 0.5);

Parameters:

runId: string
threshold?: number (default: 0.5) — Minimum score to count as “pass”

Returns: RunSummary

interface RunSummary {
  totalCases: number;
  passCount: number;
  failCount: number;
  meanScores: Record<string, number>;
  totalLatencyMs: number;
  totalTokensIn: number;
  totalTokensOut: number;
}

Prompts (Experimental)

`createPrompt(name, content)`

Create a versioned prompt.

const prompt = store.createPrompt('my-prompt', 'You are a helpful assistant.');
// { id: '...', name: 'my-prompt', version: 1, content: '...', created_at: ... }

// Creating again increments version
const prompt2 = store.createPrompt('my-prompt', 'Updated prompt.');
// { id: '...', name: 'my-prompt', version: 2, content: '...', created_at: ... }

Returns: PromptRow

interface PromptRow {
  id: string;
  name: string;
  version: number;
  content: string;
  created_at: number;
}

`listPrompts()`

List all prompts.

const prompts = store.listPrompts();

Returns: PromptRow[]

`getPrompt(id)`

Get a prompt by ID.

const prompt = store.getPrompt(promptId);

Returns: PromptRow | undefined

`deletePrompt(id)`

Delete a prompt.

store.deletePrompt(promptId);

Examples

Creating and Querying

import { RunStore } from '@deepagents/evals/store';

const store = new RunStore('.evals/store.db');

// Create suite
const suite = store.createSuite('text2sql-accuracy');

// Create run
const runId = store.createRun({
  suite_id: suite.id,
  name: 'baseline',
  model: 'gpt-4o',
});

// Save cases and scores
store.saveCases([{ /* ... */ }]);
store.saveScores([{ /* ... */ }]);

// Mark as completed
const summary = store.getRunSummary(runId);
store.finishRun(runId, 'completed', summary);

// Query runs
const runs = store.listRuns(suite.id);
for (const run of runs) {
  console.log(`${run.model}: ${run.status}`);
}

Finding Failed Cases

const failing = store.getFailingCases(runId, 0.8);
for (const c of failing) {
  console.log(`Case #${c.idx} failed:`);
  for (const s of c.scores) {
    console.log(`  ${s.scorer_name}: ${s.score.toFixed(3)}`);
  }
}

Overview

Guides

API Reference

​RunStore API

​Import

​Constructor

​new RunStore(pathOrDb?)

​Suites

​createSuite(name)

​getSuite(id)

​findSuiteByName(name)

​listSuites()

​renameSuite(id, name)

​Runs

​createRun(run)

​getRun(runId)

​listRuns(suiteId?)

​finishRun(runId, status, summary?)

​getLatestCompletedRun(suiteId, model?)

​renameRun(id, name)

​Cases

​saveCases(cases)

​getCases(runId)

​getFailingCases(runId, threshold?)

​Scores

​saveScores(scores)

​Summaries

​getRunSummary(runId, threshold?)

​Prompts (Experimental)

​createPrompt(name, content)

​listPrompts()

​getPrompt(id)

​deletePrompt(id)

​Examples

​Creating and Querying

​Finding Failed Cases

​Next Steps

Persistence Guide

Comparison

Build docs developers (and LLMs) love

RunStore API

Import

Constructor

`new RunStore(pathOrDb?)`

Suites

`createSuite(name)`

`getSuite(id)`

`findSuiteByName(name)`

`listSuites()`

`renameSuite(id, name)`

Runs

`createRun(run)`

`getRun(runId)`

`listRuns(suiteId?)`

`finishRun(runId, status, summary?)`

`getLatestCompletedRun(suiteId, model?)`

`renameRun(id, name)`

Cases

`saveCases(cases)`

`getCases(runId)`

`getFailingCases(runId, threshold?)`

Scores

`saveScores(scores)`

Summaries

`getRunSummary(runId, threshold?)`

Prompts (Experimental)

`createPrompt(name, content)`

`listPrompts()`

`getPrompt(id)`

`deletePrompt(id)`

Examples

Creating and Querying

Finding Failed Cases

Next Steps