Dataset Format
LeRobotDataset v3.0 uses a file-based structure optimized for efficient storage and loading:Key Features
Chunked Storage
Data is organized into chunks for better performance and Hub compatibility. Episodes are consolidated into files based on configurable size limits:- Data files: Default max 100 MB per file
- Video files: Default max 200 MB per file
- Chunks: Max 1000 files per chunk directory
Video Storage
Visual observations are stored as MP4 videos using efficient codecs:- Default codec:
libsvtav1(AV1) for best compression - Hardware acceleration: Auto-detection of hardware encoders (VideoToolbox, NVENC, VAAPI)
- Multiple episodes per file: Episodes are concatenated to reduce file count
Metadata
info.json
Contains dataset-level information:stats.json
Per-feature statistics for normalization:Available Datasets
Browse available datasets on the Hugging Face Hub:- lerobot/pusht - 2D pushing task (simplest, great for testing)
- lerobot/aloha_sim_insertion_human - Simulated peg insertion
- lerobot/aloha_mobile_cabinet - Real-world cabinet opening
- lerobot/xarm_lift_medium - Object lifting with xArm
Loading Datasets
Basic usage:Dataset Statistics
Datasets include pre-computed statistics for normalization:Dataset Properties
Episode Information
Next Steps
- Using LeRobotDataset - Learn how to use the dataset class
- Porting Datasets - Convert your own datasets
- Dataset Tools - Manipulate and modify datasets
- Video Encoding - Advanced video encoding options