LeRobotDataset class is a PyTorch dataset for working with robot learning data in LeRobot. It supports loading existing datasets and recording new ones.
Class Definition
src/lerobot/datasets/lerobot_dataset.py:566
Overview
LeRobotDataset provides:- Loading datasets from Hugging Face Hub or local storage
- Recording new datasets from robot interactions
- Video encoding/decoding for efficient storage
- Episode-based data organization
- Delta timestamps for temporal queries
- Push/pull from Hugging Face Hub
Constructor
Parameters
Repository identifier in format
{username}/{dataset_name} (e.g., lerobot/pusht).Local directory for dataset storage. Defaults to
$HF_LEROBOT_HOME/repo_id.List of episode indices to load. If None, loads all episodes.
Torchvision transforms to apply to image modalities.
Dictionary mapping keys to lists of time offsets for temporal queries.Example:
Tolerance in seconds for timestamp validation.
Git revision (branch, tag, or commit hash) for Hugging Face Hub.
If True, refresh local files from Hub even if already cached.
Whether to download video files.
Video decoding backend:
"torchcodec", "pyav", or "video_reader". Auto-detects if None.Number of episodes to accumulate before encoding videos. Set to 1 for immediate encoding.
Video codec:
"h264", "hevc", "libsvtav1", "auto", or hardware-specific codecs.If True, encode video frames in real-time during capture instead of writing PNGs first.
Maximum frames to buffer per camera when using streaming encoding.
Number of threads per encoder. None uses codec default.
Properties
fps
Frames per second used during data collection.
num_frames
Number of frames in selected episodes.
num_episodes
Number of episodes selected.
features
All features contained in the dataset with their metadata (dtype, shape, names).
Methods
getitem
Frame index.
Dictionary containing:
- All observation keys (e.g., images, state)
action: Action taken at this timestepepisode_index: Episode this frame belongs toframe_index: Index within the episodetimestamp: Time in secondstask: Task description string- Delta timestamp queries if configured
push_to_hub
Git branch name. If None, pushes to main.
Tags to add to the dataset card.
Dataset license.
Whether to create a version tag.
Whether to upload video files.
Whether to create a private repository.
add_frame
Dictionary containing observation and action data. Must include:
- All keys from
features task: Task description string- Optional
timestamp: Time in seconds (auto-generated if not provided)
finalize
Creating a New Dataset
Use thecreate class method to initialize a new dataset:
Dataset identifier.
Frames per second.
Dictionary defining dataset features.
Type of robot used for recording.
Whether to encode images as videos.
Usage Examples
Loading an Existing Dataset
Using with PyTorch DataLoader
Recording a New Dataset
Dataset Structure
LeRobotDataset uses a chunked file structure:See Also
- Robot API - For robot control
- lerobot-record - Script for recording datasets
- lerobot-dataset-viz - Script for visualizing datasets