Overview
Theparc_3_tracker.py script trains a physics-based tracking controller that learns to imitate the kinematic reference motions in a physics simulator. It uses reinforcement learning (PPO) in Isaac Gym to train an agent to track motions across various terrains.
Purpose
This is Stage 3 of the PARC pipeline. It:- Trains a physics-based tracking controller using RL
- Learns to track all motions in the dataset simultaneously
- Handles terrain-motion pairs in a grid layout in the simulator
- Outputs a trained policy model that can track reference motions
- Generates environment and agent configuration files
Usage
Basic Command
Default Configuration
Command-Line Arguments
| Argument | Required | Description |
|---|---|---|
--config | No | Path to the tracker training configuration YAML file |
Key Configuration Parameters
Training Settings
max_samples: Maximum number of environment samples for trainingnum_envs: Number of parallel environments to simulatedevice: Training device (e.g., “cuda:0”, “cpu”)
Environment and Agent Configuration
env_config: Path to environment configuration YAML fileagent_config: Path to agent/policy configuration YAML file
Model Paths
in_model_file: Path to pretrained model for fine-tuning (optional, use “None” or null for training from scratch)dataset_file: Path to motion dataset YAML fileoutput_dir: Directory for saving model checkpoints and logs
Dataset Creation
create_dataset_config: Path to dataset creation config (optional, creates dataset before training)
Training Process
Dataset Preparation
Ifcreate_dataset_config is provided, the script:
- Loads the dataset creation configuration
- Merges motion folders from previous stages (initial dataset + generated motions)
- Creates a unified motion dataset YAML file
Environment Setup
The script creates a DeepMimic-style tracking environment where:- Each motion-terrain pair is assigned a location in a grid layout
- Multiple agents train in parallel, each tracking different reference motions
- The environment loads terrains and reference motions from the dataset
Policy Training
The training uses PPO (Proximal Policy Optimization) with:- Motion tracking rewards based on pose similarity
- Contact rewards for matching ground contacts
- Early termination for failed tracking attempts
- Normalizer for state observations (computed from data or loaded from checkpoint)
Example Configuration
Output Files
After training, the following files are created:Environment Configuration
The environment config specifies:- Motion dataset file path
- Terrain generation/loading settings
- Reward function weights
- Early termination conditions
- Observation and action spaces
Agent Configuration
The agent config specifies:- Policy network architecture
- PPO hyperparameters (learning rate, clip epsilon, etc.)
- Value function settings
- Normalizer samples (set to 0 when fine-tuning)
Training from Previous Iteration
For iterations after the first, you should load the previous tracker as a starting point:- The agent config should set
normalizer_samples: 0(automatically handled by script) - This enables faster convergence on the expanded dataset
- The policy continues learning from where it left off
Implementation Details
Key Files
scripts/run_tracker.py: Main RL training loop implementationparc/motion_tracker/envs/ig_parkour/ig_parkour_env.py: Main Isaac Gym environmentparc/motion_tracker/envs/ig_parkour/dm_env.py: DeepMimic-style tracking sub-environment
Grid Layout
The simulator arranges terrain-motion pairs in a grid to avoid numerical issues with large coordinate values. This is more stable than a linear arrangement when training on thousands of motions.Normalizer
When training from scratch:- The agent collects random samples to compute observation normalization statistics
- This improves training stability
- The normalizer is loaded from the checkpoint
- No additional samples are needed (
normalizer_samples: 0)
Hardware Requirements
Training the tracker requires:- NVIDIA GPU with CUDA support
- Isaac Gym installation (see PARC README for setup)
- Sufficient GPU memory for parallel environments (typically 8GB+ for 4096 envs)
Usage in PARC Pipeline
Monitoring Training
Training progress can be monitored through:- Console output showing rewards and episode statistics
log.txtfile in the output directory- TensorBoard (if enabled in agent config)
Location
scripts/parc_3_tracker.py