PPOConfig

The PPOConfig dataclass defines all configuration parameters for PPO reinforcement learning training and the CL1 neural hardware interface.

Environment Configuration

doom_config

str

default:"progressive_deathmatch.cfg"

Path to the VizDoom configuration file that defines the game scenario

screen_resolution

str

default:"RES_320X240"

Screen resolution for the DOOM game buffer. Valid values are VizDoom resolution constants like RES_320X240, RES_640X480

use_screen_buffer

bool

default:"true"

Whether to enable the screen buffer for visual observations

max_turn_delta

float

default:"360.0"

Maximum absolute degrees for TURN_LEFT_RIGHT_DELTA action

turn_step_degrees

float

default:"30.0"

Discrete turn step size in degrees when using turn buttons

camera_std_init

float

default:"3.0"

Initial standard deviation (in degrees) for camera delta distribution

use_discrete_action_set

bool

default:"false"

Toggle for single categorical action space vs. combinatorial action space

Neural Interface - Channel Configuration

num_channels

int

default:"64"

Total number of channels available on the CL1 hardware

encoding_channels

List[int]

default:"[8, 9, 10, 17, 18, 25, 27, 28, 57]"

Channel indices used for encoding game state into stimulation patterns

move_forward_channels

List[int]

default:"[41, 42, 49]"

Channel indices that decode to forward movement actions

move_backward_channels

List[int]

default:"[50, 51, 58]"

Channel indices that decode to backward movement actions

move_left_channels

List[int]

default:"[13, 14, 21]"

Channel indices that decode to strafe left actions

move_right_channels

List[int]

default:"[45, 46, 53]"

Channel indices that decode to strafe right actions

turn_left_channels

List[int]

default:"[29, 30, 31, 37]"

Channel indices that decode to turn left actions

turn_right_channels

List[int]

default:"[59, 60, 61, 62]"

Channel indices that decode to turn right actions

attack_channels

List[int]

default:"[32, 33, 34]"

Channel indices that decode to attack/fire actions

Stimulation Design Parameters

Parameters used to construct cl.StimDesign objects for biphasic electrical stimulation.

phase1_duration

float

default:"160.0"

Duration of the first (negative) phase in microseconds (μs)

phase2_duration

float

default:"160.0"

Duration of the second (positive) phase in microseconds (μs)

min_amplitude

float

default:"1.0"

Minimum stimulation amplitude in microamps (μA). Used as the magnitude for phase1 (negative)

max_amplitude

float

default:"2.5"

Maximum stimulation amplitude in microamps (μA). Used as the magnitude for phase2 (positive)

Burst Design Parameters

Parameters used to construct cl.BurstDesign objects that define stimulation frequency and duration.

min_frequency

float

default:"4.0"

Minimum burst frequency in Hertz (Hz)

max_frequency

float

default:"40.0"

Maximum burst frequency in Hertz (Hz)

burst_count

int

default:"500"

Number of pulses per burst. Set to 500 to ensure stimulation lasts between game ticks

PPO Hyperparameters

learning_rate

float

default:"3e-4"

Learning rate for the Adam optimizer

gamma

float

default:"0.99"

Discount factor for future rewards

gae_lambda

float

default:"0.95"

Lambda parameter for Generalized Advantage Estimation (GAE)

clip_epsilon

float

default:"0.2"

Clipping parameter for PPO policy updates

value_loss_coef

float

default:"0.3"

Coefficient for value function loss in the total loss

entropy_coef

float

default:"0.02"

Coefficient for entropy bonus to encourage exploration

max_grad_norm

float

default:"3"

Maximum gradient norm for gradient clipping. Can be reduced to 1 or 0.5 for more conservative updates

normalize_returns

bool

default:"true"

Whether to normalize returns for critic training. Stabilizes the critic but may affect learning dynamics

Training Configuration

num_envs

int

default:"1"

Number of parallel environments to run

steps_per_update

int

default:"2048"

Number of environment steps to collect per policy update (per environment)

batch_size

int

default:"256"

Minibatch size for PPO updates

num_epochs

int

default:"4"

Number of epochs to train on each batch of collected experience

max_episodes

int

default:"2000"

Maximum number of episodes to train for

use_hardware

bool

default:"true"

Whether to use CL1 hardware or run in simulation mode

Network Architecture

hidden_size

int

default:"128"

Hidden layer size for encoder, decoder, and value networks

Logging and Checkpointing

log_dir

str

default:"checkpoints/l5_2048_rand/logs"

Directory for TensorBoard logs

checkpoint_dir

str

default:"checkpoints/l5_2048_rand"

Directory for saving model checkpoints

save_interval

int

default:"100"

Save checkpoint every N episodes

eval_interval

int

default:"50"

Evaluate policy every N episodes

Reward Shaping

feedback_positive_threshold

float

default:"1"

Reward threshold above which positive feedback is triggered. Requires tuning

feedback_negative_threshold

float

default:"-1"

Reward threshold below which negative feedback is triggered. Requires tuning

armor_terminal_reward

float

default:"1000.0"

Terminal reward bonus for armor-related objectives. Requires tuning

aim_alignment_gain

float

default:"2.5"

Gain multiplier for aim alignment reward shaping

aim_alignment_max_distance

float

default:"250.0"

Maximum distance at which aim alignment reward is computed

aim_alignment_bonus

float

default:"2.5"

Bonus reward for accurate aim alignment

aim_alignment_bonus_deg

float

default:"4.0"

Angle threshold in degrees for aim alignment bonus

movement_velocity_reward_scale

float

default:"0.01"

Scaling factor for movement velocity rewards

simplified_reward

bool

default:"true"

Use simplified reward function. When false, factors in manually shaped aim alignment and velocity rewards

Feedback Stimulation - Step-level Rewards

feedback_positive_amplitude

float

default:"2.0"

Stimulation amplitude (μA) for positive step-level feedback

feedback_positive_frequency

float

default:"20.0"

Stimulation frequency (Hz) for positive step-level feedback

feedback_positive_pulses

int

default:"30"

Number of pulses for positive step-level feedback

feedback_negative_amplitude

float

default:"2.0"

Stimulation amplitude (μA) for negative step-level feedback

feedback_negative_frequency

float

default:"60.0"

Stimulation frequency (Hz) for negative step-level feedback

feedback_negative_pulses

int

default:"90"

Number of pulses for negative step-level feedback

Feedback Stimulation - Episode-level Rewards

feedback_episode_positive_pulses

int

default:"80"

Number of pulses for positive episode-level feedback

feedback_episode_positive_frequency

float

default:"40.0"

Stimulation frequency (Hz) for positive episode-level feedback

feedback_episode_negative_pulses

int

default:"160"

Number of pulses for negative episode-level feedback

feedback_episode_negative_frequency

float

default:"120.0"

Stimulation frequency (Hz) for negative episode-level feedback

episode_only_feedback

bool

default:"false"

Whether to provide feedback only at episode end (disables step-level feedback)

Reward Feedback Channels

use_reward_feedback

bool

default:"true"

Whether to enable reward-based feedback stimulation

reward_feedback_positive_channels

List[int]

default:"[19, 20, 22]"

Channel indices for positive reward feedback

reward_feedback_negative_channels

List[int]

default:"[23, 24, 26]"

Channel indices for negative reward feedback

Event-based Feedback

event_movement_distance_threshold

float

default:"10.0"

Distance threshold for triggering movement-based events

event_feedback_settings

Dict[str, EventFeedbackConfig]

Dictionary mapping event names to EventFeedbackConfig objects. Contains configurations for:

enemy_kill: Feedback when killing an enemy
armor_pickup: Feedback when collecting armor
took_damage: Feedback when taking damage
ammo_waste: Feedback when wasting ammo
approach_target: Feedback when moving closer to target
retreat_target: Feedback when moving away from target

See EventFeedbackConfig for detailed field documentation.

Decoder Configuration

decoder_enforce_nonnegative

bool

default:"false"

Whether to enforce non-negative weights in decoder linear readout heads. Experimental - requires testing

decoder_freeze_weights

bool

default:"false"

Whether to freeze decoder weights during training. Experimental - requires testing

decoder_zero_bias

bool

default:"true"

Whether to zero out decoder biases. Recommended to be true - bias can cause decoder to generate its own predictions

decoder_use_mlp

bool

default:"false"

Whether to use MLP decoder instead of linear readout. Prefer false - MLP can learn to play the game instead of relying on neural responses

decoder_mlp_hidden

Optional[int]

default:"32"

Hidden layer size for MLP decoder. Only used if decoder_use_mlp is true. Experimental value - requires testing

decoder_weight_l2_coef

float

default:"0.0"

L2 regularization coefficient for decoder weights. Untuned

decoder_bias_l2_coef

float

default:"0.0"

L2 regularization coefficient for decoder biases. Untuned

decoder_ablation_mode

str

default:"none"

Ablation mode for testing decoder behavior. Valid values:

none: Normal operation
random: Replace spike features with random values
zero: Replace spike features with zeros

Wall Detection

wall_ray_count

int

default:"12"

Number of raycasts for wall detection. Probably less necessary with CNN encoder

wall_ray_max_range

int

default:"64"

Maximum range for wall detection raycasts. Keep as is

wall_depth_max_distance

float

default:"18.0"

Maximum depth distance for wall detection normalization. Already calibrated - keep as is

Encoder Configuration

encoder_trainable

bool

default:"true"

Whether encoder network weights are trainable. Recommended to be true for reasonable PPO policy gradients, especially with decoder_use_mlp: false

encoder_entropy_coef

float

default:"-0.10"

Entropy penalty coefficient for the encoder (uses Beta distribution sampling)

encoder_use_cnn

bool

default:"true"

Whether to use CNN for visual feature extraction. Testing shows CNN does not overfit/learn on its own - useful to keep true

encoder_cnn_channels

int

default:"16"

Number of base channels for CNN encoder. Arbitrary value - can be adjusted

encoder_cnn_downsample

int

default:"4"

Downsampling factor for CNN input. Arbitrary value - can be adjusted

Episode Feedback Events

episode_positive_feedback_event

Optional[str]

default:"None"

Specific event name to use for positive episode feedback. If None, uses overall episode reward

episode_negative_feedback_event

Optional[str]

default:"None"

Specific event name to use for negative episode feedback. If None, uses overall episode reward

Surprise-based Feedback Scaling

feedback_surprise_gain

float

default:"0.25"

Gain for surprise-based feedback scaling. Tune as needed based on neuron responses

feedback_surprise_max_scale

float

default:"2.0"

Maximum scaling factor for surprise-based feedback. Tune as needed based on neuron responses

feedback_surprise_freq_gain

Optional[float]

default:"0.65"

Frequency gain for surprise-based feedback. Tune as needed based on neuron responses

feedback_surprise_amp_gain

Optional[float]

default:"0.35"

Amplitude gain for surprise-based feedback. Tune as needed based on neuron responses

feedback_surprise_freq_max_scale

Optional[float]

default:"2.0"

Maximum frequency scale for surprise-based feedback. Tune as needed based on neuron responses

feedback_surprise_amp_max_scale

Optional[float]

default:"1.5"

Maximum amplitude scale for surprise-based feedback. Tune as needed based on neuron responses

Distance Normalization

enemy_distance_normalization

float

default:"1312.0"

Normalization constant for enemy distance features. Already calibrated - do not change

Usage Example

from ppo_doom import PPOConfig

# Create default configuration
config = PPOConfig()

# Override specific parameters
config = PPOConfig(
    learning_rate=1e-4,
    max_episodes=5000,
    encoder_use_cnn=True,
    decoder_use_mlp=False,
    use_hardware=True
)

# Access channel configuration
print(config.encoding_channels)  # [8, 9, 10, 17, 18, 25, 27, 28, 57]
print(config.attack_channels)    # [32, 33, 34]

# Access event feedback settings
kill_feedback = config.event_feedback_settings['enemy_kill']
print(kill_feedback.base_frequency)  # 20.0

Scripts

Configuration

PPOConfig Reference

PPOConfig

Environment Configuration

Neural Interface - Channel Configuration

Stimulation Design Parameters

Burst Design Parameters

PPO Hyperparameters

Training Configuration

Network Architecture

Logging and Checkpointing

Reward Shaping

Feedback Stimulation - Step-level Rewards

Feedback Stimulation - Episode-level Rewards

Reward Feedback Channels

Event-based Feedback

Decoder Configuration

Wall Detection

Encoder Configuration

Episode Feedback Events

Surprise-based Feedback Scaling

Distance Normalization

Usage Example

Build docs developers (and LLMs) love

Scripts

Configuration

​PPOConfig

​Environment Configuration

​Neural Interface - Channel Configuration

​Stimulation Design Parameters

​Burst Design Parameters

​PPO Hyperparameters

​Training Configuration

​Network Architecture

​Logging and Checkpointing

​Reward Shaping

​Feedback Stimulation - Step-level Rewards

​Feedback Stimulation - Episode-level Rewards

​Reward Feedback Channels

​Event-based Feedback

​Decoder Configuration

​Wall Detection

​Encoder Configuration

​Episode Feedback Events

​Surprise-based Feedback Scaling

​Distance Normalization

​Usage Example

Build docs developers (and LLMs) love

PPOConfig

Environment Configuration

Neural Interface - Channel Configuration

Stimulation Design Parameters

Burst Design Parameters

PPO Hyperparameters

Training Configuration

Network Architecture

Logging and Checkpointing

Reward Shaping

Feedback Stimulation - Step-level Rewards

Feedback Stimulation - Episode-level Rewards

Reward Feedback Channels

Event-based Feedback

Decoder Configuration

Wall Detection

Encoder Configuration

Episode Feedback Events

Surprise-based Feedback Scaling

Distance Normalization

Usage Example