lerobot-train-tokenizer command trains FAST (Fourier Adaptive Sequence Tokenizer) action tokenizers for compressing robot action sequences using discrete tokens.
Command
src/lerobot/scripts/lerobot_train_tokenizer.py
Overview
FAST tokenizers:- Compress action sequences using DCT (Discrete Cosine Transform) and BPE (Byte Pair Encoding)
- Reduce action dimensionality for more efficient policy training
- Support delta transforms (relative vs absolute actions)
- Apply normalization for stable encoding
- Achieve compression ratios of 3-10x depending on configuration
Key Options
Dataset Options
LeRobot dataset repository ID (e.g.,
lerobot/aloha_mobile_cabinet).Local dataset directory. Defaults to
~/.cache/huggingface/lerobot/{repo_id}.Maximum number of episodes to use. If None, uses all episodes.
Fraction of action chunks to sample per episode (0.1 = 10%).
Action Encoding Options
Number of future actions in each chunk.
Comma-separated dimension ranges to encode (e.g.,
"0:6" for first 6 dimensions).Comma-separated dimension indices for delta transform (e.g.,
"0,1,2,3,4,5").Whether to compute relative actions (delta from current state).
Dataset key for state observations used in delta transform.
Normalization Options
Normalization method:
MEAN_STD, MIN_MAX, QUANTILES, QUANTILE10, or IDENTITY.Tokenizer Options
BPE vocabulary size for the tokenizer.
DCT coefficient scaling factor for quantization.
Output Options
Directory to save tokenizer. Defaults to
fast_tokenizer_{repo_id}.Upload tokenizer to Hugging Face Hub.
Hub repository ID for upload. If None, uses output directory name.
Create private repository on Hub.
Usage Examples
Basic Training
- 10-step action horizon
- 1024 token vocabulary
- Default normalization (QUANTILES)
- Default encoding of all action dimensions
Training with Delta Transform
Training with Specific Dimensions
Training with Episode Sampling
Training and Pushing to Hub
Normalization Modes
QUANTILES (Default)
Normalizes to [-1, 1] using 1st and 99th percentiles:- Robust to outliers
- Clips extreme values
- Recommended for most use cases
MEAN_STD
Standardizes using mean and standard deviation:- (x - mean) / std
- Assumes normal distribution
- May produce values outside [-1, 1]
MIN_MAX
Normalizes to [-1, 1] using min and max:- 2 * (x - min) / (max - min) - 1
- Sensitive to outliers
- Preserves exact range
QUANTILE10
Normalizes using 10th and 90th percentiles:- Less aggressive clipping than QUANTILES
- More conservative range
IDENTITY
No normalization:- Uses raw action values
- Not recommended (poor compression)
Delta Transform
Delta transform computes relative actions: Without delta (absolute positions):- Actions represent positions (not velocities)
- Trajectories are local (small movements)
- Policy should learn relative motion patterns
Output Structure
Tokenizer is saved to:metadata.json Contents
Using the Trained Tokenizer
Load Tokenizer
Encode Actions
Decode Tokens
Training Tips
Vocabulary Size
- Smaller vocab (256-512): Faster inference, higher compression, more reconstruction error
- Medium vocab (1024-2048): Balanced trade-off (recommended)
- Larger vocab (4096+): Lower error, slower inference, less compression
Action Horizon
- Match your policy’s action chunk size
- Longer horizons (16-32): Better for smooth trajectories
- Shorter horizons (4-10): Lower latency, more reactive
Scale Parameter
- Controls DCT coefficient quantization
- Higher scale (15-20): Finer quantization, larger vocab usage
- Lower scale (5-10): Coarser quantization, smaller vocab usage
- Default (10.0) works well for most cases
Sampling Fraction
- Use 0.1-0.2 for large datasets (over 100 episodes)
- Use 0.5-1.0 for small datasets (under 50 episodes)
- Higher fraction = more training data, longer training time
Programmatic Usage
See Also
- VQ-BeT Policy - Uses action tokenizers
- LeRobotDataset - Dataset format
- lerobot-train - Policy training