Overview
Imitation learning trains a policy to map observations to actions by minimizing the difference between predicted actions and expert actions in the training data. This supervised learning approach is particularly effective when:- You have access to high-quality demonstrations
- The task has a clear success criterion
- The environment is relatively deterministic
- You want faster training compared to reinforcement learning
How It Works
Data Collection
First, collect demonstrations using teleoperation:Policy Training
Train a policy to predict actions from observations:Supported Policies
LeRobot provides several state-of-the-art imitation learning policies:ACT (Action Chunking Transformer)
ACT predicts sequences of actions (chunks) using a transformer architecture with CVAE latent conditioning. Excellent for bimanual manipulation tasks.Diffusion Policy
Diffusion Policy uses denoising diffusion models to generate smooth, multimodal action distributions. Great for dexterous manipulation.VQ-BeT
VQ-BeT uses vector quantization and behavior transformers for efficient action prediction with discrete latent representations.Key Concepts
Action Chunking
Many policies predict multiple future actions at once (action chunks) to improve temporal consistency:Observation History
Policies can use multiple past observations for better context:Data Augmentation
Improve generalization by augmenting training data:Advantages
- Sample Efficient: Learns from demonstrations without environment interaction during training
- Stable Training: Supervised learning is more stable than RL
- Fast Convergence: Can achieve good performance with 50-200 demonstrations
- Interpretable: Easy to understand what the policy is learning
Limitations
- Distribution Shift: Policy may fail on states not seen in demonstrations (covariate shift)
- No Exploration: Cannot discover new behaviors beyond demonstrations
- Data Quality: Heavily dependent on demonstration quality
- Compounding Errors: Small errors can accumulate over time
Best Practices
# Record episodes with different starting positions
lerobot-record --num-episodes=50 --vary-initial-state
# Visualize and review episodes before training
lerobot-visualize --repo-id=your_username/dataset --episode-index=0
Combining with RL
For best results, consider using imitation learning to initialize a policy, then fine-tune with reinforcement learning:Next Steps
- Train Your First Policy - Step-by-step training guide
- Evaluate Policies - Test your trained policy
- ACT Policy Guide - Learn about the ACT architecture
- Diffusion Policy Guide - Learn about Diffusion Policy