PPOModel
Base actor-critic model for PPO agents.Initialization
config(dict): Model configuration including:- Network architectures
- Activation functions
- Action distribution parameters
env: Environment instance for space dimensions
ppo_model.py:8-13
Architecture
Separate actor and critic networks:Configuration
"mlp_256_256": 2-layer MLP with 256 units"mlp_512_512": 2-layer MLP with 512 units"mlp_1024_512": 2-layer MLP with 1024→512 units- Custom sizes via configuration
ppo_model.py:25-58
Forward Methods
eval_actor()
Evaluates the actor network to get action distribution.action_dist: Distribution object (GaussianDiag)
ppo_model.py:15-18
eval_critic()
Evaluates the critic network to estimate state value.values(torch.Tensor): State value estimates
ppo_model.py:20-23
Network Building
_build_actor()
Constructs the policy network.ppo_model.py:30-37
_build_critic()
Constructs the value network.ppo_model.py:39-48
DMPPOModel
Extended model supporting advanced architectures for DeepMimic.Initialization
- Vision Transformer (ViT) for observations
- CNN-MLP hybrid networks
- Structured observation processing
dm_ppo_model.py:12-15
Supported Architectures
MLP (Standard)
Vision Transformer
- Tokenizes observations by type
- Transformer encoder layers
- Separate actor/critic output heads
dm_ppo_model.py:34-55
CNN-MLP Hybrid
- CNN processes heightmap observations
- MLP processes proprioceptive observations
- Concatenated features feed actor/critic heads
dm_ppo_model.py:56-78
Observation Structure
DMPPOModel handles structured observations:- Parse observation components
- Apply component-specific encoders
- Aggregate features
- Output action distribution / value
dm_ppo_model.py:36-42
Action Distribution
Gaussian Diagonal Distribution
dm_ppo_model.py:17-30