The lerobot-edit-dataset command provides tools for modifying datasets including deleting episodes, splitting, merging, removing features, modifying tasks, and converting image datasets to video format.
Command
lerobot-edit-dataset [OPTIONS]
Location: src/lerobot/scripts/lerobot_edit_dataset.py
Overview
Supported operations:
- delete_episodes: Remove specific episodes from dataset
- split: Split dataset into train/val/test sets
- merge: Combine multiple datasets into one
- remove_feature: Delete camera or sensor features
- modify_tasks: Update task descriptions
- convert_image_to_video: Convert image dataset to video format
- info: Display dataset statistics
Key Options
Dataset Options
Input dataset repository ID. Required for all operations except merge.
Local path to input dataset. Defaults to $HF_LEROBOT_HOME/{repo_id}.
Output dataset repository ID. Required for merge, optional for others.
Local path for output dataset. Defaults to $HF_LEROBOT_HOME/{new_repo_id}.
Upload result to Hugging Face Hub.
Operation Configuration
Operation to perform: delete_episodes, split, merge, remove_feature, modify_tasks, convert_image_to_video, or info.
Additional options depend on operation type (see examples below).
Usage Examples
Delete Episodes
Remove specific episodes from a dataset:
lerobot-edit-dataset \
--repo_id=lerobot/pusht \
--operation.type=delete_episodes \
--operation.episode_indices="[0, 2, 5]"
This creates a backup at pusht_old and modifies the dataset in-place.
Save to New Dataset
lerobot-edit-dataset \
--repo_id=lerobot/pusht \
--new_repo_id=lerobot/pusht_filtered \
--operation.type=delete_episodes \
--operation.episode_indices="[0, 2, 5, 10, 15]"
Split Dataset
Split into train/val sets by fractions:
lerobot-edit-dataset \
--repo_id=lerobot/pusht \
--operation.type=split \
--operation.splits='{"train": 0.8, "val": 0.2}'
Creates:
lerobot/pusht_train (80% of episodes)
lerobot/pusht_val (20% of episodes)
Split by Episode Indices
lerobot-edit-dataset \
--repo_id=lerobot/pusht \
--operation.type=split \
--operation.splits='{"train": [0, 1, 2, 3, 4], "val": [5, 6], "test": [7, 8, 9]}'
Custom Output Location
lerobot-edit-dataset \
--repo_id=lerobot/pusht \
--new_root=/path/to/splits \
--operation.type=split \
--operation.splits='{"train": 0.7, "val": 0.15, "test": 0.15}'
Creates splits in /path/to/splits/train, /path/to/splits/val, /path/to/splits/test.
Merge Datasets
Combine multiple datasets:
lerobot-edit-dataset \
--new_repo_id=lerobot/pusht_merged \
--operation.type=merge \
--operation.repo_ids="['lerobot/pusht_train', 'lerobot/pusht_val']"
Merge from Local Paths
lerobot-edit-dataset \
--new_repo_id=lerobot/combined_dataset \
--operation.type=merge \
--operation.repo_ids="['dataset1', 'dataset2']" \
--operation.roots="['/path/to/dataset1', '/path/to/dataset2']"
Merge and Push to Hub
lerobot-edit-dataset \
--new_repo_id=myuser/merged_dataset \
--operation.type=merge \
--operation.repo_ids="['lerobot/pusht', 'lerobot/aloha_mobile_cabinet']" \
--push_to_hub=true
Remove Feature
Delete camera or sensor features:
lerobot-edit-dataset \
--repo_id=lerobot/aloha_mobile_cabinet \
--operation.type=remove_feature \
--operation.feature_names="['observation.images.top']"
Removes the observation.images.top camera from the dataset.
Remove Multiple Features
lerobot-edit-dataset \
--repo_id=lerobot/pusht \
--new_repo_id=lerobot/pusht_minimal \
--operation.type=remove_feature \
--operation.feature_names="['observation.images.wrist', 'observation.depth']"
Modify Tasks
WARNING: Modifies dataset in-place! Back up first.
Set Single Task for All Episodes
lerobot-edit-dataset \
--repo_id=lerobot/pusht \
--operation.type=modify_tasks \
--operation.new_task="Push the T-shaped block to the target"
Set Different Tasks per Episode
lerobot-edit-dataset \
--repo_id=myuser/my_dataset \
--operation.type=modify_tasks \
--operation.episode_tasks='{"0": "Pick red cube", "1": "Pick blue cube", "2": "Pick green cube"}'
Set Default with Overrides
lerobot-edit-dataset \
--repo_id=myuser/my_dataset \
--operation.type=modify_tasks \
--operation.new_task="Standard pick and place" \
--operation.episode_tasks='{"5": "Pick fragile object", "10": "Pick heavy object"}'
Convert Image to Video
Convert image-based dataset to video format for smaller storage:
lerobot-edit-dataset \
--repo_id=lerobot/pusht_image \
--new_repo_id=lerobot/pusht_video \
--operation.type=convert_image_to_video
Custom Video Settings
lerobot-edit-dataset \
--repo_id=lerobot/pusht_image \
--new_repo_id=lerobot/pusht_video \
--operation.type=convert_image_to_video \
--operation.vcodec=h264_nvenc \
--operation.crf=23 \
--operation.num_workers=8
Convert and Push to Hub
lerobot-edit-dataset \
--repo_id=myuser/dataset_image \
--new_repo_id=myuser/dataset_video \
--operation.type=convert_image_to_video \
--push_to_hub=true
Dataset Info
Display dataset statistics:
lerobot-edit-dataset \
--repo_id=lerobot/pusht \
--operation.type=info
Output:
======Info lerobot/pusht
Repository ID: lerobot/pusht
Total episode: 100
Total task: 1
Total frame(Actual Count): 4500(4500)
Average frame per episode: 45.0
Average episode time(sec): 1.5
FPS: 30
Size: 1234.5 MB
Show Feature Details
lerobot-edit-dataset \
--repo_id=lerobot/pusht \
--operation.type=info \
--operation.show_features=true
Shows full feature schema in JSON format.
Video Encoding Options
For convert_image_to_video operation:
Video codec: h264, hevc, libsvtav1, h264_nvenc, h264_videotoolbox, etc.
Pixel format for compatibility.
GOP size (keyframe interval).
Constant rate factor (lower = higher quality, larger file).
Number of parallel encoding workers.
In-Place vs New Dataset
In-Place Modification
When new_repo_id and new_root are not specified:
lerobot-edit-dataset \
--repo_id=lerobot/pusht \
--operation.type=delete_episodes \
--operation.episode_indices="[0]"
- Creates backup:
pusht_old/
- Modifies original dataset location
- Use for permanent changes
New Dataset Creation
When new_repo_id or new_root is specified:
lerobot-edit-dataset \
--repo_id=lerobot/pusht \
--new_repo_id=lerobot/pusht_modified \
--operation.type=delete_episodes \
--operation.episode_indices="[0]"
- Original dataset unchanged
- Creates new dataset at
new_root or $HF_LEROBOT_HOME/new_repo_id
- Use when preserving original
Path Semantics
Default Paths
# Uses $HF_LEROBOT_HOME/lerobot/pusht
lerobot-edit-dataset --repo_id=lerobot/pusht ...
# Uses $HF_LEROBOT_HOME/lerobot/pusht_new
lerobot-edit-dataset --repo_id=lerobot/pusht --new_repo_id=lerobot/pusht_new ...
Explicit Paths
# Uses /path/to/dataset
lerobot-edit-dataset --repo_id=lerobot/pusht --root=/path/to/dataset ...
# Uses /path/to/output
lerobot-edit-dataset --repo_id=lerobot/pusht --new_root=/path/to/output ...
Programmatic Usage
from lerobot.scripts.lerobot_edit_dataset import (
edit_dataset,
EditDatasetConfig,
DeleteEpisodesConfig
)
config = EditDatasetConfig(
repo_id="lerobot/pusht",
new_repo_id="lerobot/pusht_filtered",
operation=DeleteEpisodesConfig(
episode_indices=[0, 2, 5]
),
push_to_hub=False,
)
edit_dataset(config)
Best Practices
- Backup First: Always backup before in-place modifications
- Test on Subset: Test operations on small datasets first
- Verify Output: Check result before pushing to Hub
- Use New Dataset: Prefer
new_repo_id over in-place for safety
- Document Changes: Record what operations were performed
See Also