Persistence

The RunStore class provides SQLite-backed persistence for evaluation runs, cases, and scores.

Creating a Store

import { RunStore } from '@deepagents/evals/store';

// Default location: .evals/store.db
const store = new RunStore();

// Custom location
const store = new RunStore('./my-evals/results.db');

// In-memory (for testing)
import { DatabaseSync } from 'node:sqlite';
const db = new DatabaseSync(':memory:');
const store = new RunStore(db);

The directory is created automatically if it doesn’t exist.

Database Schema

The store manages four main tables:

Suites

A suite groups related runs together (e.g., all runs for a specific evaluation):

interface SuiteRow {
  id: string;
  name: string;
  created_at: number;
}

Runs

A run represents a single evaluation execution:

interface RunRow {
  id: string;
  suite_id: string;
  name: string;
  model: string;
  config: Record<string, unknown> | null;
  started_at: number;
  finished_at: number | null;
  status: 'running' | 'completed' | 'failed';
  summary: RunSummary | null;
}

Cases

A case is a single test item in a run:

interface CaseRow {
  id: string;
  run_id: string;
  idx: number;
  input: unknown;
  output: string | null;
  expected: unknown | null;
  latency_ms: number;
  tokens_in: number;
  tokens_out: number;
  error: string | null;
}

Scores

A score is the result of running a scorer on a case:

interface ScoreRow {
  id: string;
  case_id: string;
  scorer_name: string;
  score: number;
  reason: string | null;
}

Suites

Create a Suite

const suite = store.createSuite('text2sql-accuracy');
// { id: '...', name: 'text2sql-accuracy', created_at: 1234567890 }

Find Suite by Name

const suite = store.findSuiteByName('text2sql-accuracy');
if (suite) {
  console.log(suite.id);
}

Get Suite by ID

const suite = store.getSuite(suiteId);

List All Suites

const suites = store.listSuites();
for (const suite of suites) {
  console.log(suite.name);
}

Rename a Suite

store.renameSuite(suiteId, 'new-name');

Runs

Create a Run

const runId = store.createRun({
  suite_id: suite.id,
  name: 'my-eval',
  model: 'gpt-4o',
  config: { temperature: 0.7 },
});

Finish a Run

Mark a run as completed or failed:

store.finishRun(runId, 'completed', summary);

Get a Run

const run = store.getRun(runId);
if (run) {
  console.log(run.status);
}

List Runs

List all runs or filter by suite:

// All runs
const runs = store.listRuns();

// Runs in a specific suite
const runs = store.listRuns(suiteId);

Get Latest Completed Run

Get the most recent completed run for a suite:

const run = store.getLatestCompletedRun(suiteId);

// Filter by model
const run = store.getLatestCompletedRun(suiteId, 'gpt-4o');

Rename a Run

store.renameRun(runId, 'new-name');

Cases

Save Cases

Save one or more cases:

store.saveCases([
  {
    id: caseId,
    run_id: runId,
    idx: 0,
    input: { question: 'What is 2+2?' },
    output: '4',
    expected: '4',
    latency_ms: 150,
    tokens_in: 10,
    tokens_out: 2,
  },
]);

Get Cases

Get all cases for a run:

const cases = store.getCases(runId);
for (const c of cases) {
  console.log(c.idx, c.output);
}

Get Failing Cases

Get cases that scored below a threshold:

const failing = store.getFailingCases(runId, 0.5);
for (const c of failing) {
  console.log(`Case #${c.idx} failed with scores:`, c.scores);
}

Returns: CaseWithScores[]

interface CaseWithScores extends CaseRow {
  scores: Array<{ scorer_name: string; score: number; reason: string | null }>;
}

Scores

Save Scores

store.saveScores([
  {
    id: scoreId,
    case_id: caseId,
    scorer_name: 'exact',
    score: 1.0,
    reason: undefined,
  },
]);

Summaries

Get Run Summary

Compute aggregated statistics for a run:

const summary = store.getRunSummary(runId, 0.5);
// {
//   totalCases: 100,
//   passCount: 85,
//   failCount: 15,
//   meanScores: { exact: 0.92, factual: 0.88 },
//   totalLatencyMs: 12340,
//   totalTokensIn: 1024,
//   totalTokensOut: 512,
// }

Parameters:

runId — Run ID
threshold — Minimum score to count as “pass” (default: 0.5)

Returns: RunSummary

interface RunSummary {
  totalCases: number;
  passCount: number;
  failCount: number;
  meanScores: Record<string, number>;
  totalLatencyMs: number;
  totalTokensIn: number;
  totalTokensOut: number;
}

Prompts (Experimental)

The store also supports versioned prompts:

Create a Prompt

const prompt = store.createPrompt('my-prompt', 'You are a helpful assistant.');
// { id: '...', name: 'my-prompt', version: 1, content: '...', created_at: ... }

// Creating again increments version
const prompt2 = store.createPrompt('my-prompt', 'Updated prompt.');
// { id: '...', name: 'my-prompt', version: 2, content: '...', created_at: ... }

List Prompts

const prompts = store.listPrompts();
for (const p of prompts) {
  console.log(`${p.name} v${p.version}`);
}

Get Prompt by ID

const prompt = store.getPrompt(promptId);

Delete a Prompt

store.deletePrompt(promptId);

Transactions

The store uses transactions internally for batch operations. You don’t need to manage transactions manually.

Migrations

The store automatically migrates older database schemas to the latest version:

Prompts versioning — Adds version column to prompts table
Suite foreign keys — Adds ON DELETE CASCADE to runs.suite_id

Migrations run automatically on store initialization.

Example: Querying Historical Data

import { RunStore } from '@deepagents/evals/store';

const store = new RunStore('.evals/store.db');

const suite = store.findSuiteByName('text2sql-accuracy');
if (!suite) throw new Error('Suite not found');

const runs = store.listRuns(suite.id);
console.log(`Found ${runs.length} runs`);

for (const run of runs) {
  if (run.status === 'completed') {
    const summary = store.getRunSummary(run.id);
    console.log(`${run.model}: ${summary.passCount}/${summary.totalCases} passed`);
  }
}

Overview

Guides

API Reference

​Persistence

​Creating a Store

​Database Schema

​Suites

​Runs

​Cases

​Scores

​Suites

​Create a Suite

​Find Suite by Name

​Get Suite by ID

​List All Suites

​Rename a Suite

​Runs

​Create a Run

​Finish a Run

​Get a Run

​List Runs

​Get Latest Completed Run

​Rename a Run

​Cases

​Save Cases

​Get Cases

​Get Failing Cases

​Scores

​Save Scores

​Summaries

​Get Run Summary

​Prompts (Experimental)

​Create a Prompt

​List Prompts

​Get Prompt by ID

​Delete a Prompt

​Transactions

​Migrations

​Example: Querying Historical Data

​Next Steps

Comparison

API Reference

Build docs developers (and LLMs) love

Persistence

Creating a Store

Database Schema

Suites

Runs

Cases

Scores

Suites

Create a Suite

Find Suite by Name

Get Suite by ID

List All Suites

Rename a Suite

Runs

Create a Run

Finish a Run

Get a Run

List Runs

Get Latest Completed Run

Rename a Run

Cases

Save Cases

Get Cases

Get Failing Cases

Scores

Save Scores

Summaries

Get Run Summary

Prompts (Experimental)

Create a Prompt

List Prompts

Get Prompt by ID

Delete a Prompt

Transactions

Migrations

Example: Querying Historical Data

Next Steps