PARC Training Loop

Overview

The PARC training loop consists of 4 main stages that execute sequentially. Each stage is independent and configured via YAML files, allowing flexible experimentation.

Stage 0: Setup Iteration

Script: scripts/parc_0_setup_iter.py Before running the PARC loop, this setup script configures all stages:

Key Functions

Generate configuration files for all 4 stages
Compute sampling weights for motion classes (e.g., balance climbing vs. running motions)
Pre-compute terrain data for augmentation during MDM training
Set up directory structure for outputs

Configuration Structure

output_dir: "path/to/iteration_output"
input_mdm_config_path: "configs/mdm.yaml"
input_kin_gen_config_path: "configs/kin_gen.yaml"
input_tracker_config_path: "configs/tracker.yaml"
input_phys_record_config_path: "configs/phys_record.yaml"
iter_start_dataset_path: "data/motions.yaml"

Source: scripts/parc_0_setup_iter.py:21-185

The setup script automatically creates timestamped configs and simple symlinked configs for each stage in subdirectories p1_train_gen, p2_kin_gen, p3_tracker, and p4_phys_record.

Stage 1: Train Generator

Script: scripts/parc_1_train_gen.py Trains or fine-tunes the motion diffusion model (MDM) on the current dataset.

Process Flow

Load or create motion sampler from dataset
- Motion sampler handles weighted sampling with replacement
- Pre-computes heightfield data for each motion
- Caches sampler to disk for faster subsequent loads
Compute dataset statistics
- Mean and std for motion features across all sequences
- Used for feature normalization during training
- Cached to sampler_stats.txt
Initialize or load MDM model
- Can start from scratch or fine-tune existing checkpoint
- Supports EMA (Exponential Moving Average) for stable training
Training loop
- Random timestep sampling (0 to diffusion_timesteps)
- Forward diffusion: add noise to clean motions
- Denoising: predict clean motion from noisy input
- Multiple loss terms (see Motion Diffusion page)

Training Configuration

epochs: 1000
batch_size: 512
iters_per_epoch: 100
diffusion_timesteps: 1000
learning_rate: 0.0001
weight_decay: 0.0

Source: scripts/parc_1_train_gen.py:30-113

Outputs

checkpoints/model_*.ckpt - Periodic checkpoints
final_model.ckpt - Final trained model
sampler.pkl - Cached motion sampler
sampler_stats.txt - Dataset statistics

Set use_wandb: true in the config to enable Weights & Biases logging. Training metrics include per-component losses, wall time, and number of samples processed.

Stage 2: Kinematic Generation

Script: scripts/parc_2_kin_gen.py Generates new kinematic motion sequences using the trained MDM.

Generation Pipeline

Detailed Steps

1. Terrain Generation

Random procedural terrains:

Random boxes - scattered box obstacles
Random stairs - ascending/descending staircases
Random paths - narrow walkways

Terrains are represented as heightfields (2D elevation grids). Source: PARC/util/terrain_util.py

2. Path Planning

A* pathfinding algorithm treats terrain as a graph:

Nodes: grid cells in heightfield
Edges: traversable connections (consider height changes)
Cost: distance + height penalty

Source: PARC/motion_synthesis/procgen/astar.py

3. Autoregressive Generation

MDM generates motions step-by-step along the path:

Start with initial pose
Generate sequence segment conditioned on heightmap + target direction
Use end of previous segment as start of next
Repeat until path is complete

Source: PARC/motion_synthesis/procgen/mdm_path.py

4. Candidate Selection

Generate N candidate sequences (e.g., N=10) and select top K (e.g., K=3) using heuristics:

Foot contact consistency
Smooth velocity profiles
Minimal ground penetration
Path following accuracy

5. Hesitation Removal

Detect and remove frames where character hesitates:

Low velocity + low acceleration = hesitation
Remove consecutive hesitation frames
Re-blend motion segments

6. Kinematic Optimization

Refine motion to improve trackability:

Foot contact constraints (IK)
Body position targets
Smooth velocity constraints
Collision avoidance

Source: PARC/motion_synthesis/motion_opt/motion_optimization.py

Batch Generation

Stage 2 typically generates motions in batches:

kin_gen_num_batches_of_motions: 10
kin_gen_num_motions_per_batch: 50
kin_gen_motion_id_offset: 0

This generates 10 batches × 50 motions = 500 new motions. Source: scripts/parc_0_setup_iter.py:42-145

Outputs

For each batch:

ignore/raw/ - Unoptimized generated motions
<batch_name>/ - Optimized motions ready for tracking
Motion files in .ms format with terrain data

The generated motions are kinematic only - they may not be physically valid. Stage 3 (tracking) validates them in physics simulation.

Stage 3: Train Tracker

Script: scripts/parc_3_tracker.py Trains a PPO agent to track the generated reference motions in Isaac Gym.

Environment Setup

DeepMimic-style tracking environment:

Simulated character - follows PD controller to track reference
Reference character - plays back kinematic motion
Terrain - heightfield collision geometry
Parallel envs - thousands of environments in parallel (one per motion)

Source: PARC/motion_tracker/envs/ig_parkour/dm_env.py:19-93

Observation Space

The agent observes:

Character state (joint positions, velocities, root state)
Reference motion (target poses from kinematic motion)
Heightmap (local terrain geometry)
Contact labels (target foot contacts)

Reward Function

DeepMimic-style tracking rewards:

Root position error - distance between sim and ref root
Root rotation error - angular difference
Body position error - per-body position differences
Body rotation error - per-body angular differences
Joint velocity error - velocity matching
End effector error - foot/hand position matching

Early termination occurs when tracking error exceeds threshold, preventing the agent from learning invalid motions.

Training Details

Algorithm: PPO with GAE (Generalized Advantage Estimation)
Policy network: MLP with 2-4 hidden layers (512-1024 units)
Value network: Shared or separate MLP
Updates: Multiple epochs per rollout buffer
Clipping: PPO ratio clipping (ε = 0.2)

Source: PARC/motion_tracker/learning/dm_ppo_agent.py:17-363

Adaptive Motion Weighting

Motions are weighted by tracking difficulty:

Easier motions (low fail rate) get lower weight
Harder motions (high fail rate) get higher weight
Fail rates updated via exponential moving average

self._ema_weight = 0.01
fail_rate = success_rate.new()
new_fail_rate = (1 - ema_weight) * old_fail_rate + ema_weight * fail_rate

Source: PARC/motion_tracker/envs/ig_parkour/dm_env.py:86-90

Grid Layout

Environments arranged in 2D grid (not a line) to avoid numerical issues with large coordinates:

[Env 0] [Env 1] [Env 2] ...
[Env N] [Env N+1] ...
...

Each environment has its own terrain and reference motion.

Outputs

model.pt - Trained PPO policy
agent_config.yaml - Agent configuration
dm_env.yaml - Environment configuration
fail_rates_*.pt - Per-motion failure rates

Stage 4: Physics Recording

Script: scripts/parc_4_phys_record.py Records physically simulated motions by running the trained controller.

Recording Process

Load trained controller from Stage 3
Load generated motions from Stage 2
Simulate in parallel - run controller on all motions simultaneously
Record successful tracks - save motions that complete without early termination
Retry with offsets - for partial failures, try starting from later frames

Retry Strategy

If tracking fails:

Try starting from beginning (0% into motion)
If fails, try starting from 25% into motion
If fails, try starting from 50% into motion
If all fail, discard motion

This recovers motions where only a small segment is untrackable. Source: doc/parc_guide.md:63-68

Quality Control

Only save motions that:

Complete without early termination
Maintain low tracking error throughout
Stay within terrain bounds
Don’t violate physics constraints

Outputs

Physically valid motion files (.ms format)
Only successfully tracked motions included
These motions are added to the dataset for next iteration

The recorded motions are subtly different from the kinematic references due to physics simulation, but they are guaranteed to be physically plausible.

Iteration Management

A complete PARC iteration:

# Setup iteration
python scripts/parc_0_setup_iter.py --config configs/setup.yaml

# Stage 1: Train generator
python scripts/parc_1_train_gen.py --config output/p1_train_gen/mdm_config.yaml

# Stage 2: Generate motions (run in parallel)
for config in output/p2_kin_gen/*/kin_gen_config.yaml; do
    python scripts/parc_2_kin_gen.py --config $config
done

# Stage 3: Train tracker  
python scripts/parc_3_tracker.py --config output/p3_tracker/tracker.yaml

# Stage 4: Record physics
python scripts/parc_4_phys_record.py --config output/p4_phys_record/phys_record.yaml

Next Iteration

For iteration N+1:

Update iter_start_dataset_path to include physically recorded motions from iteration N
Optionally use final MDM checkpoint as input_mdm_model_path
Optionally use final tracker checkpoint as input_tracker_model_path
Run parc_0_setup_iter.py with updated config
Execute 4 stages again

Performance Considerations

Stage 1 (Train Gen)

GPU memory: ~8-16GB for batch size 512
Training time: Hours to days depending on epochs
Checkpoint frequently for long runs

Stage 2 (Kin Gen)

CPU intensive for optimization
Highly parallelizable (run batches independently)
Can take hours per batch with optimization

Stage 3 (Tracker)

Requires GPU with Isaac Gym support
Most computationally expensive stage
Training time: Days for complex motions
Scales with number of motions × terrains per motion

Stage 4 (Phys Record)

Fast (minutes to hours)
Parallel execution across all environments
GPU memory scales with number of motions

Run Stage 2 batches in parallel on multiple machines to speed up kinematic generation. Each batch is independent.

Get Started

Core Concepts

Guides

Resources

​Overview

​Stage 0: Setup Iteration

​Key Functions

​Configuration Structure

​Stage 1: Train Generator

​Process Flow

​Training Configuration

​Outputs

​Stage 2: Kinematic Generation

​Generation Pipeline

​Detailed Steps

​1. Terrain Generation

​2. Path Planning

​3. Autoregressive Generation

​4. Candidate Selection

​5. Hesitation Removal

​6. Kinematic Optimization

​Batch Generation

​Outputs

​Stage 3: Train Tracker

​Environment Setup

​Observation Space

​Reward Function

​Training Details

​Adaptive Motion Weighting

​Grid Layout

​Outputs

​Stage 4: Physics Recording

​Recording Process

​Retry Strategy

​Quality Control

​Outputs

​Iteration Management

​Next Iteration

​Performance Considerations

Build docs developers (and LLMs) love

Overview

Stage 0: Setup Iteration

Key Functions

Configuration Structure

Stage 1: Train Generator

Process Flow

Training Configuration

Outputs

Stage 2: Kinematic Generation

Generation Pipeline

Detailed Steps

1. Terrain Generation

2. Path Planning

3. Autoregressive Generation

4. Candidate Selection

5. Hesitation Removal

6. Kinematic Optimization

Batch Generation

Outputs

Stage 3: Train Tracker

Environment Setup

Observation Space

Reward Function

Training Details

Adaptive Motion Weighting

Grid Layout

Outputs

Stage 4: Physics Recording

Recording Process

Retry Strategy

Quality Control

Outputs

Iteration Management

Next Iteration

Performance Considerations