Skip to main content
Datasets are the fundamental building blocks for organizing and managing your data in Avala. They can contain individual items or sequences of related data points.

Overview

A dataset is a collection of labeled or unlabeled data that can be used for training, validation, or testing. Each dataset has:
  • A unique identifier (uid)
  • A human-readable name and URL-friendly slug
  • A data type (e.g., image, video, point cloud)
  • Optional sequence support for time-series or multi-frame data
  • Visibility controls (public/private)

Creating a dataset

Create a new dataset with required metadata:
const dataset = await avala.datasets.create({
  name: 'My Training Dataset',
  slug: 'my-training-dataset',
  dataType: 'image',
  isSequence: false,
  visibility: 'private',
  createMetadata: true
});

Options

name
string
required
Human-readable name for the dataset
slug
string
required
URL-friendly identifier for the dataset
dataType
string
required
Type of data (e.g., ‘image’, ‘video’, ‘pointcloud’)
isSequence
boolean
Whether this dataset contains sequences
visibility
string
Dataset visibility: ‘public’ or ‘private’
createMetadata
boolean
Automatically create metadata storage
providerConfig
Record<string, unknown>
Storage provider configuration
ownerName
string
Owner username or organization slug

Listing datasets

Retrieve datasets with optional filters:
const page = await avala.datasets.list({
  dataType: 'image',
  status: 'active',
  limit: 20
});

console.log(page.items);
if (page.hasMore) {
  const nextPage = await avala.datasets.list({ cursor: page.nextCursor });
}
Results are paginated using cursor-based pagination. Use the nextCursor field to fetch subsequent pages.

Filter options

  • dataType - Filter by data type
  • name - Filter by dataset name
  • status - Filter by status
  • visibility - Filter by visibility level
  • limit - Number of results per page (default: 20)
  • cursor - Pagination cursor

Getting a dataset

Retrieve a single dataset by its UID:
const dataset = await avala.datasets.get('dataset_uid_here');

console.log(dataset.name);
console.log(dataset.itemCount);

Working with dataset items

Dataset items represent individual data points within a dataset.
1

List items

Retrieve items from a dataset using the owner and slug:
const items = await avala.datasets.listItems(
  'owner-name',
  'dataset-slug',
  { limit: 50 }
);

for (const item of items.items) {
  console.log(item.key, item.url);
}
2

Get a specific item

Retrieve detailed information about a single item:
const item = await avala.datasets.getItem(
  'owner-name',
  'dataset-slug',
  'item_uid'
);

console.log(item.metadata);
console.log(item.annotations);

Item properties

Each DatasetItem includes:
  • uid - Unique identifier
  • key - Item key within the dataset
  • url - Primary data URL
  • thumbnails - Preview image URLs
  • metadata - Custom metadata object
  • annotations - Annotation data
  • exportSnippet - Export format data
  • createdAt / updatedAt - Timestamps

Working with sequences

Sequences are collections of ordered frames, useful for video or time-series data.
const sequences = await avala.datasets.listSequences(
  'owner-name',
  'dataset-slug',
  { limit: 10 }
);

for (const seq of sequences.items) {
  console.log(`Sequence ${seq.key}: ${seq.numberOfFrames} frames`);
}

Sequence properties

Each DatasetSequence includes:
  • uid - Unique identifier
  • key - Sequence key
  • numberOfFrames - Frame count
  • views - Camera or sensor views
  • frames - Frame data array
  • metrics - Computed metrics
  • cropData - Cropping information
  • predefinedLabels - Label configuration
  • allowLidarCalibration - LiDAR calibration support
Use sequences for multi-frame data like videos, point cloud streams, or sensor fusion datasets.

Response types

All dataset operations use these TypeScript interfaces:
interface Dataset {
  uid: string;
  name: string;
  slug: string;
  itemCount: number;
  dataType: string | null;
  createdAt: string | null;
  updatedAt: string | null;
}

interface CursorPage<T> {
  items: T[];
  nextCursor: string | null;
  previousCursor: string | null;
  hasMore: boolean;
}

Best practices

Use descriptive slugs

Choose URL-friendly slugs that clearly identify your dataset’s purpose

Set visibility appropriately

Use ‘private’ for sensitive data and ‘public’ for shared datasets

Leverage metadata

Store custom metadata with items for filtering and organization

Paginate results

Always handle pagination when listing items or sequences

Build docs developers (and LLMs) love