Datasets

Datasets are the fundamental building blocks for organizing and managing your data in Avala. They can contain individual items or sequences of related data points.

Overview

A dataset is a collection of labeled or unlabeled data that can be used for training, validation, or testing. Each dataset has:

A unique identifier (uid)
A human-readable name and URL-friendly slug
A data type (e.g., image, video, point cloud)
Optional sequence support for time-series or multi-frame data
Visibility controls (public/private)

Creating a dataset

Create a new dataset with required metadata:

const dataset = await avala.datasets.create({
  name: 'My Training Dataset',
  slug: 'my-training-dataset',
  dataType: 'image',
  isSequence: false,
  visibility: 'private',
  createMetadata: true
});

Options

name

string

required

Human-readable name for the dataset

slug

string

required

URL-friendly identifier for the dataset

dataType

string

required

Type of data (e.g., ‘image’, ‘video’, ‘pointcloud’)

isSequence

boolean

Whether this dataset contains sequences

visibility

string

Dataset visibility: ‘public’ or ‘private’

createMetadata

boolean

Automatically create metadata storage

providerConfig

Record<string, unknown>

Storage provider configuration

ownerName

string

Owner username or organization slug

Listing datasets

Retrieve datasets with optional filters:

const page = await avala.datasets.list({
  dataType: 'image',
  status: 'active',
  limit: 20
});

console.log(page.items);
if (page.hasMore) {
  const nextPage = await avala.datasets.list({ cursor: page.nextCursor });
}

Results are paginated using cursor-based pagination. Use the nextCursor field to fetch subsequent pages.

dataType - Filter by data type
name - Filter by dataset name
status - Filter by status
visibility - Filter by visibility level
limit - Number of results per page (default: 20)
cursor - Pagination cursor

Getting a dataset

Retrieve a single dataset by its UID:

const dataset = await avala.datasets.get('dataset_uid_here');

console.log(dataset.name);
console.log(dataset.itemCount);

Working with dataset items

Dataset items represent individual data points within a dataset.

List items

Retrieve items from a dataset using the owner and slug:

const items = await avala.datasets.listItems(
  'owner-name',
  'dataset-slug',
  { limit: 50 }
);

for (const item of items.items) {
  console.log(item.key, item.url);
}

Get a specific item

Retrieve detailed information about a single item:

const item = await avala.datasets.getItem(
  'owner-name',
  'dataset-slug',
  'item_uid'
);

console.log(item.metadata);
console.log(item.annotations);

Item properties

Each DatasetItem includes:

uid - Unique identifier
key - Item key within the dataset
url - Primary data URL
thumbnails - Preview image URLs
metadata - Custom metadata object
annotations - Annotation data
exportSnippet - Export format data
createdAt / updatedAt - Timestamps

Working with sequences

Sequences are collections of ordered frames, useful for video or time-series data.

List sequences
Get sequence details

const sequences = await avala.datasets.listSequences(
  'owner-name',
  'dataset-slug',
  { limit: 10 }
);

for (const seq of sequences.items) {
  console.log(`Sequence ${seq.key}: ${seq.numberOfFrames} frames`);
}

const sequence = await avala.datasets.getSequence(
  'owner-name',
  'dataset-slug',
  'sequence_uid'
);

console.log(sequence.views);
console.log(sequence.frames);
console.log(sequence.metrics);

Sequence properties

Each DatasetSequence includes:

uid - Unique identifier
key - Sequence key
numberOfFrames - Frame count
views - Camera or sensor views
frames - Frame data array
metrics - Computed metrics
cropData - Cropping information
predefinedLabels - Label configuration
allowLidarCalibration - LiDAR calibration support

Use sequences for multi-frame data like videos, point cloud streams, or sensor fusion datasets.

Response types

All dataset operations use these TypeScript interfaces:

interface Dataset {
  uid: string;
  name: string;
  slug: string;
  itemCount: number;
  dataType: string | null;
  createdAt: string | null;
  updatedAt: string | null;
}

interface CursorPage<T> {
  items: T[];
  nextCursor: string | null;
  previousCursor: string | null;
  hasMore: boolean;
}

Best practices

Use descriptive slugs

Choose URL-friendly slugs that clearly identify your dataset’s purpose

Set visibility appropriately

Use ‘private’ for sensitive data and ‘public’ for shared datasets

Leverage metadata

Store custom metadata with items for filtering and organization

Paginate results

Always handle pagination when listing items or sequences

Get Started

Core Concepts

Guides

MCP Server

Overview

Creating a dataset

Options

Listing datasets

Getting a dataset

Working with dataset items

Item properties

Working with sequences

Sequence properties

Response types

Best practices

Use descriptive slugs

Set visibility appropriately

Leverage metadata

Paginate results

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

MCP Server

​Overview

​Creating a dataset

​Options

​Listing datasets

​Filter options

​Getting a dataset

​Working with dataset items

​Item properties

​Working with sequences

​Sequence properties

​Response types

​Best practices

Use descriptive slugs

Set visibility appropriately

Leverage metadata

Paginate results

Build docs developers (and LLMs) love

Overview

Creating a dataset

Options

Listing datasets

Filter options

Getting a dataset

Working with dataset items

Item properties

Working with sequences

Sequence properties

Response types

Best practices