Quick Start
Evaluate a pre-trained model from the Hub:Evaluation in Simulation
Standard Benchmarks
LeRobot supports popular robotics benchmarks:LIBERO
Evaluate on LIBERO manipulation tasks:libero_spatial- Spatial reasoning taskslibero_object- Object manipulationlibero_goal- Goal-oriented taskslibero_10- 10 diverse taskslibero_90- 90 task benchmark
PushT
Evaluate pushing tasks:Gymnasium
Evaluate on Gymnasium robotics environments:Custom Simulation Environments
Evaluate in your own simulation:Evaluation on Real Robots
Using Pre-trained Models
Deploy a trained policy on your robot:Recording Evaluation Videos
Record videos during evaluation for analysis:Metrics and Analysis
Success Rate
The primary metric for manipulation tasks:Reward Statistics
Analyze reward distribution:Episode Length Analysis
Track how quickly the policy solves tasks:Advanced Evaluation
Multi-task Evaluation
Evaluate a policy across multiple tasks:Robustness Testing
Test policy robustness to perturbations:Ablation Studies
Compare different model configurations:Best Practices
# Use same normalization stats as training
dataset_metadata = LeRobotDatasetMetadata("training_dataset")
preprocessor, postprocessor = make_pre_post_processors(
policy.config,
dataset_stats=dataset_metadata.stats # Critical!
)
lerobot-eval \
--policy.path=model \
--env.type=pusht \
--eval.save_videos=true \
--eval.video_dir=eval_videos
Next Steps
- Bring Your Own Policy - Deploy custom policies
- Train Your First Policy - Training guide
- Imitation Learning - Learn about policy types
- Bring Your Own Hardware - Integrate custom robots