PPOAgent
Base Proximal Policy Optimization agent for motion tracking.Initialization
config(dict): Agent configuration including:- Learning rates, batch size, training parameters
- Model architecture configuration
env: Environment instance (IGParkourEnv or DeepMimicEnv)device(str): PyTorch device
ppo_agent.py:15-20
Configuration Parameters
ppo_agent.py:22-49
Core Methods
train_model()
Trains the agent for a specified number of samples.max_samples(int): Total environment steps to trainout_model_file(str): Final model save pathint_output_dir(str): Checkpoint directorylog_file(str): Training log file pathlogger_type(str): Logger type (“tensorboard”, “wandb”)
base_agent.py, called from training loop
test_model()
Evaluates the agent over multiple episodes.test_info(dict):mean_return: Average episodic returnmean_ep_len: Average episode lengthnum_eps: Number of episodes evaluated
step()
Executes a single environment step with the current policy.- Environment step outputs plus action and action metadata
ppo_agent.py:339-342
Action Selection
_decide_action()
Internal method for action selection with exploration.ppo_agent.py:84-116
Training Iteration
_train_iter()
Single training iteration (rollout + update).- Rollout: Collect experience in buffer
- Build training data: Compute advantages and target values
- Update: Multiple epochs of PPO updates
Loss Computation
PPO Actor Loss
ppo_agent.py:275-327
Critic Loss
ppo_agent.py:256-273
Advantage Estimation
ppo_agent.py:124-171
DMPPOAgent
Extended PPO agent for DeepMimic environments with tracking error metrics.Initialization
- Tracking error metrics
- Motion failure rate logging
- Contact force recording
- Environment-specific metadata
dm_ppo_agent.py:17-29
Enhanced Training
train_model()
- Motion failure rates per motion ID
- Failure rate quantiles (25%, 50%, 75%)
- Mean/max failure rates per motion class
- Detailed reward component breakdowns
dm_ppo_agent.py:191-233
Enhanced Testing
test_model()
dm_ppo_agent.py:93-180
Experience Buffer Extensions
dm_ppo_agent.py:239-264
Observation Normalization
dm_ppo_agent.py:48-87
Model Building
Inherits from PPOAgent but uses DMPPOModel for advanced architectures.dm_ppo_agent.py:43-46