Skip to main content

Overview

The encoder network converts game observations into stimulation parameters (frequency and amplitude) for biological neurons. The decoder network reads spike features from neurons and outputs action logits. Together they form the biological neural interface.

Encoder Configuration

Trainability

encoder_trainable
bool
default:"True"
Whether the encoder weights are trainable via backpropagation.When True, the encoder learns to generate optimal stimulation parameters using Beta distributions. When False, uses fixed sigmoid-based mapping.
Code comment: “Can try turning it False but I would say True is needed for reasonable PPO policy gradients especially if decoder_use_mlp: False”
config = PPOConfig(
    encoder_trainable=True  # Enable encoder learning
)

Entropy Coefficient

encoder_entropy_coef
float
default:"-0.10"
Entropy penalty coefficient for encoder Beta distributions.Negative value acts as entropy penalty (encourages more deterministic stimulation). Positive values would encourage exploration in stimulation space.
config = PPOConfig(
    encoder_entropy_coef=-0.10  # Penalty for encoder randomness
)

CNN Visual Processing

encoder_use_cnn
bool
default:"True"
Enable CNN processing of visual screen buffer.When enabled, adds a convolutional neural network to process downsampled game screen before the encoder MLP.
Code comment: “With my testing it seems like the CNN does not overfit/learn on its own, seems useful to keep True”
encoder_cnn_channels
int
default:"16"
Base number of CNN channels in the first convolutional layer.The CNN architecture uses progressive channel expansion:
  • Layer 1: encoder_cnn_channels (default 16)
  • Layer 2: encoder_cnn_channels * 2 (default 32)
  • Layer 3: encoder_cnn_channels * 4 (default 64)
In training_server.py, this is increased to 64 channels per DOOM Initial Report for better visual feature extraction.
encoder_cnn_downsample
int
default:"4"
Downsampling factor for screen buffer before CNN processing.Original resolution is divided by this factor. For example, with 320×240 resolution and downsample=4, CNN processes 80×60 images.
config = PPOConfig(
    encoder_use_cnn=True,
    encoder_cnn_channels=64,    # Increased capacity
    encoder_cnn_downsample=4    # 4x downsampling
)

CNN Architecture Details

The encoder CNN uses the following architecture:
nn.Sequential(
    # Layer 1: base_channels filters
    nn.Conv2d(1, base_channels, kernel_size=3, stride=1, padding=1),
    nn.SiLU(),
    nn.MaxPool2d(2),
    
    # Layer 2: base_channels * 2 filters
    nn.Conv2d(base_channels, base_channels * 2, kernel_size=3, stride=1, padding=1),
    nn.SiLU(),
    nn.MaxPool2d(2),
    
    # Layer 3: base_channels * 4 filters
    nn.Conv2d(base_channels * 2, base_channels * 4, kernel_size=3, stride=1, padding=1),
    nn.SiLU(),
    nn.AdaptiveAvgPool2d((1, 1))  # Global pooling to single vector
)

Decoder Configuration

Architecture Type

decoder_use_mlp
bool
default:"False"
Use MLP decoder instead of linear readout heads.
  • False: Direct linear readout from spike features (recommended)
  • True: 2-layer MLP processes spikes before action heads
Code comment: “Prefer to be false, causes decoder to learn how to play the game but was tested on random spikes, could be different in prod”
decoder_mlp_hidden
int
default:"32"
Hidden layer size when decoder_use_mlp=True.In ppo_doom.py: default is 32 In training_server.py: increased to 256
# Linear decoder (recommended)
config = PPOConfig(
    decoder_use_mlp=False
)

# MLP decoder (experimental)
config = PPOConfig(
    decoder_use_mlp=True,
    decoder_mlp_hidden=256
)

Weight Constraints

decoder_enforce_nonnegative
bool
default:"False"
Enforce non-negative weights in decoder linear readout heads.When True, applies softplus activation to weights: weight = softplus(raw_weight)This ensures all spike contributions are positive, which can be biologically interpretable.
decoder_freeze_weights
bool
default:"False"
Freeze all decoder parameters (no gradient updates).Useful for testing whether the encoder alone can learn, or for transfer learning scenarios.
decoder_zero_bias
bool
default:"True"
Force decoder bias terms to zero and disable bias gradients.
Code comment: “Prefer to be true, needs testing, bias tends to cause the decoder to generate its own predictions for movement”
Setting bias to zero ensures actions are driven entirely by spike activity, not learned biases.
config = PPOConfig(
    decoder_enforce_nonnegative=False,  # Allow negative weights
    decoder_freeze_weights=False,       # Train decoder
    decoder_zero_bias=True              # Force zero bias
)

L2 Regularization

decoder_weight_l2_coef
float
default:"0.0"
L2 regularization coefficient for decoder weights.Penalizes large weights to encourage simpler linear readouts. Currently untuned (set to 0.0).
decoder_bias_l2_coef
float
default:"0.0"
L2 regularization coefficient for decoder biases.Currently untuned (set to 0.0).
config = PPOConfig(
    decoder_weight_l2_coef=0.001,  # Add weight regularization
    decoder_bias_l2_coef=0.0
)

Ablation Testing

decoder_ablation_mode
str
default:"'none'"
Ablation mode for testing decoder learning.
  • 'none': Normal operation, use real spike features
  • 'zero': Replace spike features with zeros
  • 'random': Replace spike features with random values
Used to test if decoder is learning on its own vs. relying on neural activity.
config = PPOConfig(
    decoder_ablation_mode='zero'  # Test with no neural input
)

Network Architecture

Hidden Layer Size

hidden_size
int
default:"128"
Hidden layer size for encoder, decoder MLP, and value network.Used across all network components for consistency.
config = PPOConfig(
    hidden_size=256  # Increase model capacity
)

Example Configurations

Minimal Linear Decoder

# Pure linear readout from spikes - maximum biological interpretability
minimal_config = PPOConfig(
    encoder_trainable=True,
    encoder_use_cnn=False,
    encoder_entropy_coef=-0.10,
    decoder_use_mlp=False,
    decoder_enforce_nonnegative=True,   # Positive weights only
    decoder_zero_bias=True,             # No bias
    decoder_freeze_weights=False,
    hidden_size=128
)

CNN-Based Encoder

# Visual processing with CNN encoder
visual_config = PPOConfig(
    encoder_trainable=True,
    encoder_use_cnn=True,
    encoder_cnn_channels=64,            # High capacity
    encoder_cnn_downsample=4,
    encoder_entropy_coef=-0.10,
    decoder_use_mlp=False,
    decoder_zero_bias=True,
    hidden_size=256
)

MLP Decoder (Experimental)

# Non-linear decoder with MLP
mlp_config = PPOConfig(
    encoder_trainable=True,
    encoder_use_cnn=True,
    encoder_cnn_channels=32,
    decoder_use_mlp=True,
    decoder_mlp_hidden=256,
    decoder_zero_bias=False,            # MLP can use bias
    decoder_enforce_nonnegative=False,
    hidden_size=128
)

Frozen Decoder Testing

# Test encoder learning with fixed decoder
frozen_config = PPOConfig(
    encoder_trainable=True,
    encoder_use_cnn=True,
    decoder_use_mlp=False,
    decoder_freeze_weights=True,        # Freeze decoder
    decoder_zero_bias=True,
    hidden_size=128
)

PPO Hyperparameters

Learning rate, gamma, GAE settings

Feedback Tuning

Stimulation feedback parameters

Build docs developers (and LLMs) love