Skip to main content

PPOConfig

The PPOConfig dataclass defines all configuration parameters for PPO reinforcement learning training and the CL1 neural hardware interface.

Environment Configuration

doom_config
str
default:"progressive_deathmatch.cfg"
Path to the VizDoom configuration file that defines the game scenario
screen_resolution
str
default:"RES_320X240"
Screen resolution for the DOOM game buffer. Valid values are VizDoom resolution constants like RES_320X240, RES_640X480
use_screen_buffer
bool
default:"true"
Whether to enable the screen buffer for visual observations
max_turn_delta
float
default:"360.0"
Maximum absolute degrees for TURN_LEFT_RIGHT_DELTA action
turn_step_degrees
float
default:"30.0"
Discrete turn step size in degrees when using turn buttons
camera_std_init
float
default:"3.0"
Initial standard deviation (in degrees) for camera delta distribution
use_discrete_action_set
bool
default:"false"
Toggle for single categorical action space vs. combinatorial action space

Neural Interface - Channel Configuration

num_channels
int
default:"64"
Total number of channels available on the CL1 hardware
encoding_channels
List[int]
default:"[8, 9, 10, 17, 18, 25, 27, 28, 57]"
Channel indices used for encoding game state into stimulation patterns
move_forward_channels
List[int]
default:"[41, 42, 49]"
Channel indices that decode to forward movement actions
move_backward_channels
List[int]
default:"[50, 51, 58]"
Channel indices that decode to backward movement actions
move_left_channels
List[int]
default:"[13, 14, 21]"
Channel indices that decode to strafe left actions
move_right_channels
List[int]
default:"[45, 46, 53]"
Channel indices that decode to strafe right actions
turn_left_channels
List[int]
default:"[29, 30, 31, 37]"
Channel indices that decode to turn left actions
turn_right_channels
List[int]
default:"[59, 60, 61, 62]"
Channel indices that decode to turn right actions
attack_channels
List[int]
default:"[32, 33, 34]"
Channel indices that decode to attack/fire actions

Stimulation Design Parameters

Parameters used to construct cl.StimDesign objects for biphasic electrical stimulation.
phase1_duration
float
default:"160.0"
Duration of the first (negative) phase in microseconds (μs)
phase2_duration
float
default:"160.0"
Duration of the second (positive) phase in microseconds (μs)
min_amplitude
float
default:"1.0"
Minimum stimulation amplitude in microamps (μA). Used as the magnitude for phase1 (negative)
max_amplitude
float
default:"2.5"
Maximum stimulation amplitude in microamps (μA). Used as the magnitude for phase2 (positive)

Burst Design Parameters

Parameters used to construct cl.BurstDesign objects that define stimulation frequency and duration.
min_frequency
float
default:"4.0"
Minimum burst frequency in Hertz (Hz)
max_frequency
float
default:"40.0"
Maximum burst frequency in Hertz (Hz)
burst_count
int
default:"500"
Number of pulses per burst. Set to 500 to ensure stimulation lasts between game ticks

PPO Hyperparameters

learning_rate
float
default:"3e-4"
Learning rate for the Adam optimizer
gamma
float
default:"0.99"
Discount factor for future rewards
gae_lambda
float
default:"0.95"
Lambda parameter for Generalized Advantage Estimation (GAE)
clip_epsilon
float
default:"0.2"
Clipping parameter for PPO policy updates
value_loss_coef
float
default:"0.3"
Coefficient for value function loss in the total loss
entropy_coef
float
default:"0.02"
Coefficient for entropy bonus to encourage exploration
max_grad_norm
float
default:"3"
Maximum gradient norm for gradient clipping. Can be reduced to 1 or 0.5 for more conservative updates
normalize_returns
bool
default:"true"
Whether to normalize returns for critic training. Stabilizes the critic but may affect learning dynamics

Training Configuration

num_envs
int
default:"1"
Number of parallel environments to run
steps_per_update
int
default:"2048"
Number of environment steps to collect per policy update (per environment)
batch_size
int
default:"256"
Minibatch size for PPO updates
num_epochs
int
default:"4"
Number of epochs to train on each batch of collected experience
max_episodes
int
default:"2000"
Maximum number of episodes to train for
use_hardware
bool
default:"true"
Whether to use CL1 hardware or run in simulation mode

Network Architecture

hidden_size
int
default:"128"
Hidden layer size for encoder, decoder, and value networks

Logging and Checkpointing

log_dir
str
default:"checkpoints/l5_2048_rand/logs"
Directory for TensorBoard logs
checkpoint_dir
str
default:"checkpoints/l5_2048_rand"
Directory for saving model checkpoints
save_interval
int
default:"100"
Save checkpoint every N episodes
eval_interval
int
default:"50"
Evaluate policy every N episodes

Reward Shaping

feedback_positive_threshold
float
default:"1"
Reward threshold above which positive feedback is triggered. Requires tuning
feedback_negative_threshold
float
default:"-1"
Reward threshold below which negative feedback is triggered. Requires tuning
armor_terminal_reward
float
default:"1000.0"
Terminal reward bonus for armor-related objectives. Requires tuning
aim_alignment_gain
float
default:"2.5"
Gain multiplier for aim alignment reward shaping
aim_alignment_max_distance
float
default:"250.0"
Maximum distance at which aim alignment reward is computed
aim_alignment_bonus
float
default:"2.5"
Bonus reward for accurate aim alignment
aim_alignment_bonus_deg
float
default:"4.0"
Angle threshold in degrees for aim alignment bonus
movement_velocity_reward_scale
float
default:"0.01"
Scaling factor for movement velocity rewards
simplified_reward
bool
default:"true"
Use simplified reward function. When false, factors in manually shaped aim alignment and velocity rewards

Feedback Stimulation - Step-level Rewards

feedback_positive_amplitude
float
default:"2.0"
Stimulation amplitude (μA) for positive step-level feedback
feedback_positive_frequency
float
default:"20.0"
Stimulation frequency (Hz) for positive step-level feedback
feedback_positive_pulses
int
default:"30"
Number of pulses for positive step-level feedback
feedback_negative_amplitude
float
default:"2.0"
Stimulation amplitude (μA) for negative step-level feedback
feedback_negative_frequency
float
default:"60.0"
Stimulation frequency (Hz) for negative step-level feedback
feedback_negative_pulses
int
default:"90"
Number of pulses for negative step-level feedback

Feedback Stimulation - Episode-level Rewards

feedback_episode_positive_pulses
int
default:"80"
Number of pulses for positive episode-level feedback
feedback_episode_positive_frequency
float
default:"40.0"
Stimulation frequency (Hz) for positive episode-level feedback
feedback_episode_negative_pulses
int
default:"160"
Number of pulses for negative episode-level feedback
feedback_episode_negative_frequency
float
default:"120.0"
Stimulation frequency (Hz) for negative episode-level feedback
episode_only_feedback
bool
default:"false"
Whether to provide feedback only at episode end (disables step-level feedback)

Reward Feedback Channels

use_reward_feedback
bool
default:"true"
Whether to enable reward-based feedback stimulation
reward_feedback_positive_channels
List[int]
default:"[19, 20, 22]"
Channel indices for positive reward feedback
reward_feedback_negative_channels
List[int]
default:"[23, 24, 26]"
Channel indices for negative reward feedback

Event-based Feedback

event_movement_distance_threshold
float
default:"10.0"
Distance threshold for triggering movement-based events
event_feedback_settings
Dict[str, EventFeedbackConfig]
Dictionary mapping event names to EventFeedbackConfig objects. Contains configurations for:
  • enemy_kill: Feedback when killing an enemy
  • armor_pickup: Feedback when collecting armor
  • took_damage: Feedback when taking damage
  • ammo_waste: Feedback when wasting ammo
  • approach_target: Feedback when moving closer to target
  • retreat_target: Feedback when moving away from target
See EventFeedbackConfig for detailed field documentation.

Decoder Configuration

decoder_enforce_nonnegative
bool
default:"false"
Whether to enforce non-negative weights in decoder linear readout heads. Experimental - requires testing
decoder_freeze_weights
bool
default:"false"
Whether to freeze decoder weights during training. Experimental - requires testing
decoder_zero_bias
bool
default:"true"
Whether to zero out decoder biases. Recommended to be true - bias can cause decoder to generate its own predictions
decoder_use_mlp
bool
default:"false"
Whether to use MLP decoder instead of linear readout. Prefer false - MLP can learn to play the game instead of relying on neural responses
decoder_mlp_hidden
Optional[int]
default:"32"
Hidden layer size for MLP decoder. Only used if decoder_use_mlp is true. Experimental value - requires testing
decoder_weight_l2_coef
float
default:"0.0"
L2 regularization coefficient for decoder weights. Untuned
decoder_bias_l2_coef
float
default:"0.0"
L2 regularization coefficient for decoder biases. Untuned
decoder_ablation_mode
str
default:"none"
Ablation mode for testing decoder behavior. Valid values:
  • none: Normal operation
  • random: Replace spike features with random values
  • zero: Replace spike features with zeros

Wall Detection

wall_ray_count
int
default:"12"
Number of raycasts for wall detection. Probably less necessary with CNN encoder
wall_ray_max_range
int
default:"64"
Maximum range for wall detection raycasts. Keep as is
wall_depth_max_distance
float
default:"18.0"
Maximum depth distance for wall detection normalization. Already calibrated - keep as is

Encoder Configuration

encoder_trainable
bool
default:"true"
Whether encoder network weights are trainable. Recommended to be true for reasonable PPO policy gradients, especially with decoder_use_mlp: false
encoder_entropy_coef
float
default:"-0.10"
Entropy penalty coefficient for the encoder (uses Beta distribution sampling)
encoder_use_cnn
bool
default:"true"
Whether to use CNN for visual feature extraction. Testing shows CNN does not overfit/learn on its own - useful to keep true
encoder_cnn_channels
int
default:"16"
Number of base channels for CNN encoder. Arbitrary value - can be adjusted
encoder_cnn_downsample
int
default:"4"
Downsampling factor for CNN input. Arbitrary value - can be adjusted

Episode Feedback Events

episode_positive_feedback_event
Optional[str]
default:"None"
Specific event name to use for positive episode feedback. If None, uses overall episode reward
episode_negative_feedback_event
Optional[str]
default:"None"
Specific event name to use for negative episode feedback. If None, uses overall episode reward

Surprise-based Feedback Scaling

feedback_surprise_gain
float
default:"0.25"
Gain for surprise-based feedback scaling. Tune as needed based on neuron responses
feedback_surprise_max_scale
float
default:"2.0"
Maximum scaling factor for surprise-based feedback. Tune as needed based on neuron responses
feedback_surprise_freq_gain
Optional[float]
default:"0.65"
Frequency gain for surprise-based feedback. Tune as needed based on neuron responses
feedback_surprise_amp_gain
Optional[float]
default:"0.35"
Amplitude gain for surprise-based feedback. Tune as needed based on neuron responses
feedback_surprise_freq_max_scale
Optional[float]
default:"2.0"
Maximum frequency scale for surprise-based feedback. Tune as needed based on neuron responses
feedback_surprise_amp_max_scale
Optional[float]
default:"1.5"
Maximum amplitude scale for surprise-based feedback. Tune as needed based on neuron responses

Distance Normalization

enemy_distance_normalization
float
default:"1312.0"
Normalization constant for enemy distance features. Already calibrated - do not change

Usage Example

from ppo_doom import PPOConfig

# Create default configuration
config = PPOConfig()

# Override specific parameters
config = PPOConfig(
    learning_rate=1e-4,
    max_episodes=5000,
    encoder_use_cnn=True,
    decoder_use_mlp=False,
    use_hardware=True
)

# Access channel configuration
print(config.encoding_channels)  # [8, 9, 10, 17, 18, 25, 27, 28, 57]
print(config.attack_channels)    # [32, 33, 34]

# Access event feedback settings
kill_feedback = config.event_feedback_settings['enemy_kill']
print(kill_feedback.base_frequency)  # 20.0

Build docs developers (and LLMs) love