Skip to main content

Dataset API

The dataset module provides lazy, chainable iterables for loading and transforming evaluation data.

Import

import { dataset } from '@deepagents/evals/dataset';

dataset(source)

Create a dataset from various sources.

Signature

function dataset<T>(
  source: T[] | string | AsyncIterable<T>
): Dataset<T>;

Parameters

source: T[]

Inline array:
const ds = dataset([
  { input: 'What is 2+2?', expected: '4' },
]);

source: string

File path (.json, .jsonl, .csv):
const ds = dataset('./questions.json');
const ds = dataset('./questions.jsonl');
const ds = dataset('./questions.csv');

source: AsyncIterable<T>

Custom async iterable:
async function* loadFromDB() {
  const rows = await db.query('SELECT * FROM test_cases');
  for (const row of rows) {
    yield { input: row.question, expected: row.answer };
  }
}

const ds = dataset(loadFromDB());

Dataset<T>

The Dataset class implements AsyncIterable<T> and provides chainable transform methods.

map<U>(fn)

Transform each item:
const ds = dataset('./raw-data.json')
  .map((row) => ({
    input: row.question,
    expected: row.answer,
  }));
Type:
map<U>(fn: (item: T) => U): Dataset<U>
Lazy:

filter(fn)

Exclude items that don’t match a predicate:
const ds = dataset('./data.json')
  .filter((row) => row.difficulty === 'hard');
Type:
filter(fn: (item: T) => boolean): Dataset<T>
Lazy:

limit(n)

Cap the dataset at n items:
const ds = dataset('./large-dataset.jsonl')
  .limit(100);
Type:
limit(n: number): Dataset<T>
Lazy:

shuffle()

Randomize the order of items:
const ds = dataset('./data.json')
  .shuffle();
Type:
shuffle(): Dataset<T>
Eager: ⚠️ Buffers all items into memory.

sample(n)

Pick n random items:
const ds = dataset('./large-dataset.jsonl')
  .sample(50);
Type:
sample(n: number): Dataset<T>
Eager: ⚠️ Buffers all items into memory.

pick(indexes)

Select specific items by index:
const ds = dataset('./data.json')
  .pick(new Set([0, 5, 10]));
Type:
pick(indexes: Set<number>): Dataset<T>
Lazy:

toArray()

Consume the dataset into a plain array:
const items = await dataset('./data.json').toArray();
console.log(items.length);
Type:
toArray(): Promise<T[]>
Eager: ⚠️ Loads all items into memory.

Hugging Face Datasets

hf(options)

Load datasets from Hugging Face:
import { hf } from '@deepagents/evals/dataset';

const ds = hf({
  repo: 'rajpurkar/squad',
  split: 'validation',
  limit: 100,
});
Options:
interface HfOptions {
  repo: string;       // Hugging Face repo ID
  split: string;      // Dataset split (e.g., 'train', 'validation')
  limit?: number;     // Maximum number of rows to fetch
}

Record Selection

parseRecordSelection(spec)

Parse a record selection string:
import { parseRecordSelection } from '@deepagents/evals/dataset';

const { indexes } = parseRecordSelection('0-10,15,20-25');
// indexes: Set { 0, 1, 2, ..., 10, 15, 20, 21, ..., 25 }
Type:
function parseRecordSelection(spec: string): ParsedRecordSelection;

interface ParsedRecordSelection {
  indexes: Set<number>;
}
Supported formats:
  • 0-10 — Range from 0 to 10 (inclusive)
  • 5 — Single index
  • 0-10,15,20-25 — Multiple ranges and indexes

filterRecordsByIndex(iterable, indexes)

Filter an iterable by index set:
import { filterRecordsByIndex } from '@deepagents/evals/dataset';

const filtered = filterRecordsByIndex(
  dataset('./data.json'),
  new Set([0, 5, 10])
);
Type:
function filterRecordsByIndex<T>(
  iterable: AsyncIterable<T>,
  indexes: Set<number>
): AsyncIterable<T>;

pickFromArray(array, indexes)

Pick specific items from an array:
import { pickFromArray } from '@deepagents/evals/dataset';

const items = [1, 2, 3, 4, 5];
const picked = pickFromArray(items, new Set([0, 2, 4]));
// [1, 3, 5]
Type:
function pickFromArray<T>(array: T[], indexes: Set<number>): T[];

Types

TransformFn<T, U>

type TransformFn<T, U> = (item: T) => U;

PredicateFn<T>

type PredicateFn<T> = (item: T) => boolean;

Examples

Loading from JSON

import { dataset } from '@deepagents/evals/dataset';

const ds = dataset('./questions.json');

for await (const item of ds) {
  console.log(item);
}

Chaining Transforms

const ds = dataset('./large-dataset.jsonl')
  .filter((row) => row.difficulty === 'hard')
  .map((row) => ({ input: row.question, expected: row.answer }))
  .shuffle()
  .limit(100);

Custom Async Iterable

async function* loadFromDB() {
  const rows = await db.query('SELECT * FROM test_cases');
  for (const row of rows) {
    yield { input: row.question, expected: row.answer };
  }
}

const ds = dataset(loadFromDB());

Record Selection

import { parseRecordSelection } from '@deepagents/evals/dataset';

const { indexes } = parseRecordSelection('0-10,15,20-25');
const ds = dataset('./data.json').pick(indexes);

Next Steps

Scorers

Learn about scoring functions

Quickstart

Run your first evaluation

Build docs developers (and LLMs) love