Skip to main content

Overview

Policies in LeRobot are neural network models that map observations to actions. They are the core component that enables robots to learn from demonstrations and make decisions. LeRobot provides several state-of-the-art policy architectures:
  • Diffusion Policy: Denoising diffusion for smooth action sequences
  • ACT (Action Chunking Transformer): Transformer-based imitation learning
  • VQ-BeT: Vector-quantized behavior transformer
  • TDMPC: Temporal difference model predictive control
  • VLA Policies: Vision-language-action models (PI0, SmolVLA, XVLA)

Policy Interface

All policies inherit from PreTrainedPolicy and share a common interface:
from lerobot.policies.diffusion import DiffusionPolicy, DiffusionConfig

# Create policy
config = DiffusionConfig()
policy = DiffusionPolicy(config)

# Training mode
loss, output_dict = policy.forward(batch)
loss.backward()

# Inference mode
policy.eval()
with torch.no_grad():
    action = policy.select_action(observation)

Key Methods

forward()

Training method that computes the loss:
def forward(self, batch: dict[str, Tensor]) -> tuple[Tensor, dict | None]:
    """Compute training loss from a batch of data.
    
    Args:
        batch: Dictionary containing observations, actions, etc.
        
    Returns:
        loss: Scalar tensor to optimize
        output_dict: Optional auxiliary outputs
    """
Source: src/lerobot/policies/diffusion/modeling_diffusion.py:141

select_action()

Inference method that generates a single action:
@torch.no_grad()
def select_action(self, batch: dict[str, Tensor]) -> Tensor:
    """Select a single action given current observations.
    
    This method handles:
    - Observation history buffering
    - Action sequence generation
    - Action queue management
    
    Args:
        batch: Current observation dictionary
        
    Returns:
        action: Single action tensor to execute
    """
Source: src/lerobot/policies/diffusion/modeling_diffusion.py:103

reset()

Clears internal state between episodes:
policy.reset()
Must be called when starting a new episode to clear observation and action queues. Source: src/lerobot/policies/diffusion/modeling_diffusion.py:82

Configuration

Each policy has a configuration class that defines its hyperparameters:
from lerobot.policies.diffusion import DiffusionConfig

config = DiffusionConfig(
    # Architecture
    vision_backbone="resnet18",
    
    # Temporal parameters
    n_obs_steps=2,        # Number of observation steps
    horizon=16,           # Action prediction horizon
    n_action_steps=8,     # Number of action steps to execute
    
    # Diffusion
    num_inference_steps=10,
    
    # Features
    image_features=["observation.images.top"],
    state_features=["observation.state"],
)
Source: src/lerobot/policies/diffusion/configuration_diffusion.py

Example: Diffusion Policy

Diffusion Policy uses denoising diffusion to generate smooth action sequences.

Architecture

The Diffusion Policy consists of:
  1. Observation encoder: Processes images and state
  2. Noise prediction network: U-Net that predicts noise to remove
  3. Diffusion scheduler: DDPM/DDIM scheduler for sampling

Temporal Structure

Observation history (n_obs_steps=2):
  [t-1] [t]

Action horizon (horizon=16):
  [t] [t+1] [t+2] ... [t+15]

Executed actions (n_action_steps=8):
  [t] [t+1] [t+2] ... [t+7]
Only n_action_steps actions are executed before predicting again. Source: src/lerobot/policies/diffusion/modeling_diffusion.py:112

Training

from torch.utils.data import DataLoader
from lerobot.datasets import LeRobotDataset
from lerobot.policies.diffusion import DiffusionPolicy, DiffusionConfig

# Load dataset
dataset = LeRobotDataset(
    repo_id="lerobot/pusht",
    delta_timestamps={
        "observation.state": [-0.033, 0.0],      # 2 steps
        "action": [i * 0.033 for i in range(16)], # 16 steps
    },
)

# Create policy
config = DiffusionConfig(
    n_obs_steps=2,
    horizon=16,
    n_action_steps=8,
)
policy = DiffusionPolicy(config)

# Training loop
optimizer = torch.optim.Adam(policy.parameters(), lr=1e-4)
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)

for batch in dataloader:
    loss, _ = policy.forward(batch)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

Inference

policy.eval()
policy.reset()  # Clear queues at episode start

obs = env.reset()
for step in range(1000):
    # Select action
    batch = {
        "observation.state": torch.tensor(obs["state"]),
        "observation.images.top": torch.tensor(obs["image"]),
    }
    action = policy.select_action(batch)
    
    # Execute in environment
    obs, reward, done, info = env.step(action.cpu().numpy())
    
    if done:
        policy.reset()
        obs = env.reset()

Policy-Specific Features

ACT (Action Chunking Transformer)

from lerobot.policies.act import ACTPolicy, ACTConfig

config = ACTConfig(
    chunk_size=100,          # Action chunk length
    n_obs_steps=1,
    dim_model=512,
    n_heads=8,
    dim_feedforward=3200,
    n_encoder_layers=4,
    n_decoder_layers=7,
)
policy = ACTPolicy(config)

VQ-BeT (Vector Quantized Behavior Transformer)

from lerobot.policies.vqbet import VQBeTPolicy, VQBeTConfig

config = VQBeTConfig(
    n_obs_steps=1,
    n_action_pred_token=10,   # Number of action prediction tokens
    vocab_size=512,            # Codebook size
    block_size=1000,           # Maximum sequence length
)
policy = VQBeTPolicy(config)

Vision-Language-Action (VLA) Models

from lerobot.policies.pi0 import PI0Policy, PI0Config

config = PI0Config(
    pretrained_model_name="lerobot/pi0-base",
    use_action_prediction_head=True,
)
policy = PI0Policy(config)

# VLAs can take language instructions
batch = {
    "observation.images.top": image,
    "observation.state": state,
    "task": "pick up the red cube",  # Language instruction
}
action = policy.select_action(batch)

Saving and Loading

Save to Disk

policy.save_pretrained("path/to/checkpoint")

Load from Disk

policy = DiffusionPolicy.from_pretrained("path/to/checkpoint")

Push to Hub

policy.push_to_hub(
    repo_id="username/my_policy",
    private=False,
)

Load from Hub

policy = DiffusionPolicy.from_pretrained("username/my_policy")

Feature Configuration

Policies need to know which features to use from the dataset:
config = DiffusionConfig(
    # Visual features
    image_features=[
        "observation.images.top",
        "observation.images.wrist",
    ],
    
    # Proprioceptive features
    state_features=["observation.state"],
    
    # Environment state (optional)
    env_state_feature="observation.environment_state",
    
    # Action features
    action_features=["action"],
)
The policy will automatically extract these features from batches during training.

Device and Dtype

Policies automatically detect device and dtype:
# Move to GPU
policy = policy.to("cuda")
policy = policy.to(torch.float16)  # Half precision

# Device is inferred from parameters
device = policy.device
dtype = policy.dtype

Processing Pipelines

Policies can integrate with processing pipelines for normalization and data transformation. See Processors for details.

Best Practices

Match training and inference: Ensure delta_timestamps in your dataset match the policy’s n_obs_steps and horizon configuration.
Reset between episodes: Always call policy.reset() when starting a new episode to clear observation and action queues.
Temporal consistency: The relationship n_action_steps <= horizon - n_obs_steps + 1 must hold for proper action execution.

Available Policies

PolicyBest ForKey Features
DiffusionSmooth controlDenoising diffusion, action sequences
ACTBimanual tasksTransformer, long action chunks
VQ-BeTDiscrete behaviorsVector quantization, efficient
TDMPCModel-based RLWorld model, planning
PI0/SmolVLALanguage-conditionedVision-language, foundation model
XVLAGeneralist policiesCross-embodiment, large-scale

Next Steps

Build docs developers (and LLMs) love