Overview
ppo_doom.py is the main training script for the DOOM Neuron project. It trains biological neurons to play DOOM using PPO (Proximal Policy Optimization) reinforcement learning with direct CL1 hardware integration.
Key Features:
- PPO policy with encoder-decoder neural architecture
- Direct CL1 SDK integration for real-time neural stimulation and spike recording
- VizDoom environment with customizable scenarios
- Tensorboard logging and checkpoint management
- Hardware loop running at configurable tick frequency
source/ppo_doom.py
Command-Line Arguments
Basic Options
Execution mode for the scriptChoices:
train, watchtrain: Full training mode with CL1 hardware and gradient updateswatch: Observe neural activity without training (inference mode)
Path to checkpoint file for loading pre-trained weightsExample:
--checkpoint checkpoints/l5_2048_rand/checkpoint_7900.ptMaximum number of training episodes before termination
PyTorch device for gradient computationChoices:
cpu, cudaNeural Interface Options
Ablation mode for diagnostic testing of decoder dependency on spikesChoices:
none: Normal operation, use real spike featureszero: Replace spike features with zeros (tests decoder bias)random: Replace spike features with random values (tests decoder robustness)
Enable CNN encoder over screen buffer in addition to scalar featuresWhen enabled, the encoder processes both scalar observations (player stats, enemy positions) and downsampled screen buffer through a convolutional network.
Hardware & Recording
Display the VizDoom game window during trainingUseful for debugging and visualization, but may impact performance.
Directory path for saving CL1 recordingsRecordings contain raw neural data captured during training sessions.
Frequency (Hz) for running the CL1 hardware loopControls how many times per second the system:
- Applies neural stimulation
- Collects spike responses
- Updates game state
Usage Examples
Training from Scratch
Resume from Checkpoint
Watch Mode (Inference Only)
Ablation Testing
Architecture Overview
PPOConfig Dataclass
The script uses aPPOConfig dataclass for configuration. Key parameters:
- Environment:
doom_config,screen_resolution,max_turn_delta - Neural Interface: Channel assignments for encoding, movement, camera, attack
- Stimulation:
phase1_duration,phase2_duration,min_amplitude,max_amplitude - Burst Design:
min_frequency,max_frequency,burst_count - PPO Hyperparameters:
learning_rate,gamma,gae_lambda,clip_epsilon - Training:
num_envs,steps_per_update,batch_size,num_epochs - Network:
hidden_size, encoder/decoder configuration
Neural Networks
EncoderNetwork
Encodes game state into CL1 stimulation parameters (frequencies and amplitudes)- Optional CNN for screen buffer processing
- Beta distribution outputs for trainable encoder
- Supports both deterministic and stochastic encoding
DecoderNetwork
Decodes spike responses into game actions- Linear readout heads with optional non-negative constraints
- Separate heads for movement, camera, and attack
- Optional MLP architecture (experimental)
ValueNetwork
Estimates state value for PPO critic- Multi-layer perceptron with SiLU activation
- Single output predicting expected return
PPOPolicy Class
Main policy class combining encoder, decoder, and value network:sample_encoder(): Generate stimulation parametersdecode_spikes_to_action(): Convert spikes to actionsevaluate_actions(): Compute log probabilities and entropy for PPO updateapply_stimulation(): Send stimulation to CL1 hardwarecollect_spikes(): Gather spike counts from CL1 tickget_value(): Get state value estimate
Hardware Loop
The training loop integrates directly with CL1 hardware:Checkpoints
Checkpoints are saved everysave_interval episodes to checkpoint_dir:
See Also
- training_server.py - Remote training with UDP communication
- cl1_neural_interface.py - CL1 hardware interface server
- udp_protocol.py - UDP packet formats