Common Issues
No Spikes Detected
Symptoms:- TensorBoard shows
Spikes/total_count= 0 - Agent behavior is random/static
- Episode rewards flat or declining
-
CL1 device not connected
-
UDP port mismatch
-
Firewall blocking UDP
-
Network latency/packet loss
Verify spike collection loop
Verify spike collection loop
From Add debug logging:
cl1_neural_interface.py:200-220:Check stimulation application
Check stimulation application
From Verify stimulation parameters:
cl1_neural_interface.py:173-200:frequenciesin range [4.0, 40.0] Hzamplitudesin range [1.0, 2.5] μAphase1_durationandphase2_duration= 160 μs
Training Divergence
Symptoms:- Reward suddenly drops to zero
- Policy outputs NaN values
- Gradient norms explode (>100)
-
Learning rate too high
-
Gradient clipping too weak
-
Unnormalized returns causing value explosion
Monitor gradient norms
Monitor gradient norms
From TensorBoard:Watch these metrics:
Training/Policy_Grad_Norm(should stay < 10)Training/Value_Grad_Norm(should stay < 10)Training/Encoder_Grad_Norm(if encoder trainable)
max_grad_norm, gradients are being clipped. Reduce learning rate.Check for NaN in policy outputs
Check for NaN in policy outputs
Add logging to Common causes:
ppo_doom.py:- Division by zero in normalization
- Exploding decoder weights (check L2 regularization)
- Invalid spike counts (negative values)
Low Reward Despite Good Behavior
Symptoms:- Agent visibly plays well (kills enemies, picks up items)
- TensorBoard shows low
Episode_Reward - Feedback stimulation seems ineffective
-
Reward shaping misconfigured
-
Feedback thresholds too strict
-
Event feedback channels misconfigured
From
ppo_doom.py:165-257, checkevent_feedback_settings:
Enable simplified reward
Enable simplified reward
From README.md line 149:
For deadly corridor:simplified_reward=Truedisables manually shaped aim alignment and velocity. I did more tuning onFalseso it’s probably better kept this way.
Verify feedback stimulation
Verify feedback stimulation
Check UDP packet transmission:Inspect
feedback_port logs for:- Packet send count
- Amplitude/frequency values
- Channel assignments
training_server.py, feedback is sent via:Decoder Bias Dominating
Symptoms:Decoder/forward_wx_bias_ratio< 1.0 (bias larger than weight*input)- Ablation modes (
zero,random) show similar performance to real spikes - Agent behavior unchanged when neurons are silenced
-
decoder_zero_bias=False(default in some configs) -
Decoder MLP learning a static policy
-
Encoder not trainable
Run ablation tests
Run ablation tests
See Ablation Modes page.Quick test:Compare TensorBoard metrics. If both show similar reward curves, decoder bias is dominating.
Monitor wx/bias ratio
Monitor wx/bias ratio
From In TensorBoard, check:
ppo_doom.py:718-744:Decoder/forward_wx_bias_ratio(should be > 1.0)Decoder/attack_wx_bias_ratioDecoder/camera_wx_bias_ratio
Network Connectivity Issues
Symptoms:cl1_neural_interface.pyreports “Connection timeout”- Training server logs “No spike data received”
- Sporadic packet loss
-
IP address mismatch
-
Port already in use
-
Network latency
Verify UDP socket setup
Verify UDP socket setup
From Verify output:
cl1_neural_interface.py:145-171:Test UDP connectivity
Test UDP connectivity
- Firewall rules (
sudo ufw status) - Routing tables (
ip route) - Network interface config (
ifconfig)
TensorBoard Monitoring
Essential Metrics
Key Plots
Training Progress
Training Progress
Episode Reward (
Training/Episode_Reward)- Should increase over time
- High variance early (exploration)
- Plateaus indicate convergence or need for curriculum change
Training/Kill_Count)- Tracks combat effectiveness
- Should correlate with reward
- Flat = agent not engaging enemies
Training/Survival_Time)- Longer = better policy
- Sudden drops = environment difficulty increase or policy collapse
Neural Activity
Neural Activity
Total Spike Count (
Spikes/total_count)- Should be > 0 every episode
- If zero, check CL1 connection and stimulation
Spikes/encoding, Spikes/move_forward, etc.)- Shows which channels are active
- Uneven distribution may indicate channel imbalance
Encoder/freq_mean, Encoder/amp_mean)- Frequencies should be in [4.0, 40.0] Hz
- Amplitudes in [1.0, 2.5] μA
- Stuck values = encoder not learning
Policy Health
Policy Health
Policy Gradient Norm (
Training/Policy_Grad_Norm)- Should stay <
max_grad_norm(default 3.0) - Consistently hitting limit = reduce learning rate
Training/Entropy)- Measures policy randomness
- High early (exploration), decreases over time
- Too low too fast = premature convergence
Training/KL_Divergence)- Measures policy change between updates
- Should be small (< 0.1)
- Large spikes = policy instability
Decoder Diagnostics
Decoder Diagnostics
wx/bias Ratio (
Decoder/forward_wx_bias_ratio)- Should be > 1.0 (weights dominate bias)
- < 1.0 = decoder bias is compensating for spikes
- If
decoder_zero_bias=True, bias metrics will be zero
Decoder/weight_l2)- Tracks decoder weight magnitude
- Explosion (>1000) = add L2 regularization
Decoder/forward_bias_abs_mean)- Should be near zero if
decoder_zero_bias=True - Growing bias = decoder learning static policy
Custom Logging
Add debugging metrics toppo_doom.py:
Debugging Commands
Check Process Status
Monitor Resource Usage
Inspect Checkpoints
Test Neural Interface
Getting Help
Collecting Diagnostic Info
Common Error Messages
RuntimeError: CUDA out of memory
RuntimeError: CUDA out of memory
Solution:Reduce batch size or steps per update:Or switch to CPU:
ConnectionRefusedError: [Errno 111] Connection refused
ConnectionRefusedError: [Errno 111] Connection refused
Solution:CL1 device not running or wrong IP:
ValueError: Channel X is reserved and cannot be used
ValueError: Channel X is reserved and cannot be used
Solution:From
ppo_doom.py:285-350, forbidden channels are .Edit channel assignments:Reporting Issues
Include:- Full command used to start training/CL1
- TensorBoard screenshots of key metrics
- Last 50 lines of logs
- Configuration used (
PPOConfigvalues) - Network topology (IP addresses, ports)
- System specs (GPU, RAM, OS)