Feedback System

Overview

Beyond the encoder-decoder loop, the DOOM Neuron system provides event-based feedback stimulation to biological neurons. This feedback acts as an auxiliary teaching signal, delivering reward/punishment information through dedicated neural channels based on game events and temporal-difference (TD) prediction errors.

Feedback Architecture

The feedback system operates in parallel to the main encoder-decoder loop:

┌─────────────────────────────────────────────────────────┐
│                    Training System                      │
│                                                         │
│  Game Events ──▶ Surprise Calculation ──▶ Feedback     │
│  (kills, damage)  (TD error scaling)     Commands      │
│                                                         │
│                           │                             │
│                           ▼                             │
│                    UDP Feedback Port                    │
│                       (12348)                           │
└─────────────────────────────────────────────────────────┘
                            │
                            │ UDP packets
                            ▼
┌─────────────────────────────────────────────────────────┐
│                      CL1 Device                         │
│                                                         │
│                  Feedback Socket                        │
│                         │                               │
│                         ▼                               │
│           apply_feedback_command()                      │
│                         │                               │
│         ┌───────────────┼───────────────┐               │
│         ▼               ▼               ▼               │
│   Reward Channels  Event Channels  Interrupt           │
│   [19, 20, 22]    [35, 36, 38]    (stop stim)          │
│   [23, 24, 26]    [44, 47, 48]                          │
│                   [39, 40, 43]                          │
│                   [52, 54, 55]                          │
│                   [5, 6, 11]                            │
│                   [12, 15, 16]                          │
└─────────────────────────────────────────────────────────┘

Feedback Channel Types

Reward Feedback Channels

Dedicated channels for general positive/negative reward signals:

# From PPOConfig in training_server.py:182-183
reward_feedback_positive_channels = [19, 20, 22]  # 3 channels
reward_feedback_negative_channels = [23, 24, 26]  # 3 channels

Purpose: Provide binary reward/punishment feedback based on step-level rewards:

if reward > feedback_positive_threshold:  # Default: +1
    # Stimulate positive reward channels
    send_feedback(channels=[19, 20, 22], frequency=20Hz, amplitude=2.0μA)
elif reward < feedback_negative_threshold:  # Default: -1  
    # Stimulate negative reward channels
    send_feedback(channels=[23, 24, 26], frequency=60Hz, amplitude=2.0μA)

Reward feedback channels are separate from encoding/action channels to avoid confounding the encoder-decoder learning. The decoder doesn’t read from these channels.

Event Feedback Channels

Dedicated channels for specific game events with surprise-based scaling:

# From PPOConfig.event_feedback_settings in training_server.py:185-249
event_feedback_settings = {
    'enemy_kill': EventFeedbackConfig(
        channels=[35, 36, 38],
        base_frequency=20.0,
        base_amplitude=2.5,
        base_pulses=40,
        info_key='event_enemy_kill',
        td_sign='positive',  # Reward event
        freq_gain=0.20,
        freq_max_scale=2.5,
    ),
    'took_damage': EventFeedbackConfig(
        channels=[44, 47, 48],
        base_frequency=90.0,
        base_amplitude=2.2,
        base_pulses=50,
        info_key='event_took_damage',
        td_sign='negative',  # Punishment event
        unpredictable=True,  # Enable unpredictable stimulation
        unpredictable_frequency=5.0,
        unpredictable_duration_sec=4.0,
    ),
    'armor_pickup': EventFeedbackConfig(
        channels=[39, 40, 43],
        base_frequency=20.0,
        base_amplitude=2.0,
        base_pulses=35,
        td_sign='positive',
    ),
    # ... more events ...
}

Event Types:

All Event Feedback Channels

Positive Events (Rewards):

enemy_kill [35, 36, 38]: Agent eliminates an enemy
armor_pickup [39, 40, 43]: Agent collects armor item
approach_target [5, 6, 11]: Agent moves closer to enemy

Negative Events (Punishments):

took_damage [44, 47, 48]: Agent receives damage
ammo_waste [52, 54, 55]: Agent shoots without hitting
retreat_target [12, 15, 16]: Agent moves away from enemy

Surprise Scaling

Temporal-Difference (TD) Error

Feedback intensity is modulated by surprise - how unexpected the event was:

# Conceptual implementation
td_error = reward + gamma * next_value - current_value

# For positive events (kills, armor)
if td_sign == 'positive':
    surprise = max(0, td_error)  # Positive surprise only

# For negative events (damage, ammo waste)
elif td_sign == 'negative':
    surprise = max(0, -td_error)  # Negative surprise (unexpected bad outcome)

# For magnitude-based scaling
elif td_sign == 'absolute':
    surprise = abs(td_error)

TD error measures prediction error:

Positive TD: Event better than expected (surprising reward)
Negative TD: Event worse than expected (surprising punishment)
Zero TD: Event perfectly predicted (no surprise)

By scaling feedback with TD error, the system emphasizes unexpected outcomes that provide the most learning value.

Surprise-Scaled Parameters

Feedback stimulation parameters scale with surprise magnitude:

# From EventFeedbackConfig fields
class EventFeedbackConfig:
    # Base parameters (used when surprise = 0)
    base_frequency: float = 20.0    # Hz
    base_amplitude: float = 2.5     # μA
    base_pulses: int = 40           # Number of pulses
    
    # Surprise scaling gains
    freq_gain: float = 0.20         # How much surprise affects frequency
    amp_gain: float = 0.20          # How much surprise affects amplitude  
    pulse_gain: float = 0.20        # How much surprise affects pulse count
    
    # Maximum scaling factors
    freq_max_scale: float = 2.5     # Max frequency multiplier
    amp_max_scale: float = 1.6      # Max amplitude multiplier
    pulse_max_scale: float = 2.5    # Max pulse count multiplier

# Scaling computation
freq_scale = 1.0 + min(freq_gain * surprise, freq_max_scale - 1.0)
amp_scale = 1.0 + min(amp_gain * surprise, amp_max_scale - 1.0)
pulse_scale = 1.0 + min(pulse_gain * surprise, pulse_max_scale - 1.0)

final_frequency = base_frequency * freq_scale
final_amplitude = base_amplitude * amp_scale
final_pulses = int(base_pulses * pulse_scale)

Example: Enemy kill with high surprise

# Low surprise (kill was expected)
td_error = 0.5
frequency = 20.0 * (1.0 + 0.20 * 0.5) = 22.0 Hz
amplitude = 2.5 * (1.0 + 0.20 * 0.5) = 2.75 μA
pulses = 40 * (1.0 + 0.20 * 0.5) = 44

# High surprise (unexpected kill)
td_error = 8.0
frequency = 20.0 * (1.0 + min(0.20 * 8.0, 1.5)) = 20.0 * 2.5 = 50.0 Hz
amplitude = 2.5 * (1.0 + min(0.20 * 8.0, 0.6)) = 2.5 * 1.6 = 4.0 μA
pulses = 40 * (1.0 + min(0.20 * 8.0, 1.5)) = 40 * 2.5 = 100

Surprise scaling acts as a natural curriculum: Early in training, most events are surprising (high TD errors), producing strong feedback. As the value function improves, only genuinely unexpected events trigger strong feedback.

Exponential Moving Average (EMA)

To stabilize surprise estimates, TD errors are smoothed over time:

# From EventFeedbackConfig
ema_beta: float = 0.99  # Smoothing factor

# Update formula
ema_td_error = ema_beta * ema_td_error + (1 - ema_beta) * current_td_error

surprise = abs(ema_td_error)

EMA prevents single outlier TD errors from dominating feedback:

ema_beta = 0.99: Heavy smoothing, slow adaptation
ema_beta = 0.90: Faster adaptation to changing predictions

Unpredictable Stimulation

Damage Aversion Learning

Certain negative events (like taking damage) use unpredictable stimulation to create aversion:

# From EventFeedbackConfig for 'took_damage'
unpredictable: bool = True
unpredictable_frequency: float = 5.0      # Hz (low frequency, irregular)
unpredictable_duration_sec: float = 4.0  # 4 seconds of stimulation
unpredictable_rest_sec: float = 4.0      # 4 seconds rest
unpredictable_channels: List[int] = [44, 47, 48]
unpredictable_amplitude: float = 2.2     # μA

Purpose: Create persistent, uncomfortable stimulation that the agent learns to avoid. Mechanism:

Agent takes damage
Trigger unpredictable stimulation on damage channels
Low-frequency (5 Hz) irregular pulses for 4 seconds
Rest for 4 seconds
Repeat pattern if damage continues

Unpredictable stimulation differs from regular feedback:

Duration: Lasts seconds, not milliseconds
Pattern: Irregular, low-frequency (harder for neurons to adapt)
Channels: Same as event feedback but with different parameters
Goal: Aversion learning, not just event signaling

Feedback Command Protocol

UDP Packet Format

# From udp_protocol.py:235-309
def pack_feedback_command(
    feedback_type: str,        # "interrupt", "event", or "reward"
    channels: List[int],       # Channel numbers to stimulate
    frequency: int,            # Hz
    amplitude: float,          # μA
    pulses: int,               # Number of pulses
    unpredictable: bool,       # Unpredictable stimulation flag
    event_name: str            # Event identifier
) -> bytes:
    # Returns 120-byte binary packet

Packet Structure (120 bytes total):

[8 bytes]  timestamp (microseconds)
[1 byte]   feedback_type (0=interrupt, 1=event, 2=reward)
[1 byte]   num_channels
[64 bytes] channel array (0xFF padding for unused)
[4 bytes]  frequency (int)
[4 bytes]  amplitude (float)
[4 bytes]  pulses (int)
[1 byte]   unpredictable flag (0 or 1)
[32 bytes] event_name (null-padded string)
[1 byte]   padding

Feedback Types

Feedback Command Types

Type 0: Interrupt

feedback_type = "interrupt"
channels = [19, 20, 22, 23, 24, 26]  # All reward channels
frequency = 0
amplitude = 0
pulses = 0

Stops ongoing stimulation on specified channels. Used to clear feedback before new events.Type 1: Event Feedback

feedback_type = "event"
channels = [35, 36, 38]  # Enemy kill channels
frequency = 50  # Hz (surprise-scaled)
amplitude = 4.0  # μA (surprise-scaled)
pulses = 100  # (surprise-scaled)
event_name = "enemy_kill"

Delivers event-specific feedback with surprise scaling.Type 2: Reward Feedback

feedback_type = "reward"
channels = [19, 20, 22]  # Positive reward channels
frequency = 20  # Hz
amplitude = 2.0  # μA
pulses = 30
event_name = "positive_reward"

Delivers binary reward/punishment signals based on step rewards.

CL1 Feedback Application

Applying Feedback to Hardware

# From cl1_neural_interface.py:238-290
def apply_feedback_command(
    self,
    neurons: cl.Neurons,
    feedback_type: str,
    channels: list,
    frequency: int,
    amplitude: float,
    pulses: int,
    unpredictable: bool,
    event_name: str
):
    """Apply feedback stimulation to neural hardware."""
    
    # Handle interrupt command
    if feedback_type == "interrupt":
        if channels:
            channel_set = cl.ChannelSet(*channels)
            neurons.interrupt(channel_set)
        return
    
    # Skip invalid parameters
    if not channels or frequency <= 0 or amplitude <= 0:
        return
    
    # Create channel set
    channel_set = cl.ChannelSet(*channels)
    
    # Create stimulation design (cached)
    cache_key = (feedback_type, tuple(channels), frequency, round(amplitude, 4))
    
    def _factory():
        stim_design = cl.StimDesign(
            phase1_duration=120,
            phase1_amplitude=-amplitude,
            phase2_duration=120,
            phase2_amplitude=amplitude
        )
        burst_design = cl.BurstDesign(pulses, frequency)
        return (stim_design, burst_design)
    
    stim_design, burst_design = self._stim_cache.get_or_set(cache_key, _factory)
    
    # Apply stimulation
    neurons.stim(channel_set, stim_design, burst_design)
    self.feedback_commands_received += 1

Key Points:

Interrupt commands clear ongoing feedback
Event/reward feedback uses same biphasic pulse design as encoder
Stimulation designs are cached (LRU, maxsize=2048)
Non-blocking socket prevents loop stalls

Feedback Timing

Step-Level Feedback

Reward feedback is sent after each environment step:

# In training loop
for step in range(steps_per_update):
    # ... encoder, stimulation, spike collection, action ...
    
    reward, done, info = env.step(actions)
    
    # Send reward feedback
    if use_reward_feedback and not episode_only_feedback:
        if reward > feedback_positive_threshold:
            send_feedback_command(
                type="reward",
                channels=reward_feedback_positive_channels,
                frequency=feedback_positive_frequency,
                amplitude=feedback_positive_amplitude,
                pulses=feedback_positive_pulses
            )
        elif reward < feedback_negative_threshold:
            send_feedback_command(
                type="reward",
                channels=reward_feedback_negative_channels,
                frequency=feedback_negative_frequency,
                amplitude=feedback_negative_amplitude,
                pulses=feedback_negative_pulses
            )

Event-Level Feedback

Event feedback is sent when specific events occur:

# Check for events
for event_name, event_config in event_feedback_settings.items():
    if info[event_config.info_key] > 0:  # Event occurred
        # Compute surprise
        td_error = compute_td_error(reward, value, next_value)
        surprise = scale_surprise(td_error, event_config)
        
        # Scale feedback parameters
        freq = event_config.base_frequency * (1 + surprise * event_config.freq_gain)
        amp = event_config.base_amplitude * (1 + surprise * event_config.amp_gain)
        pulses = int(event_config.base_pulses * (1 + surprise * event_config.pulse_gain))
        
        # Send feedback
        send_feedback_command(
            type="event",
            channels=event_config.channels,
            frequency=freq,
            amplitude=amp,
            pulses=pulses,
            unpredictable=event_config.unpredictable,
            event_name=event_name
        )

Episode-Level Feedback

Optional feedback at episode end based on total episode performance:

if use_episode_feedback and done:
    if total_episode_reward > 0:
        # Positive episode outcome
        send_feedback_command(
            type="event",
            channels=event_feedback_settings['enemy_kill'].channels,
            frequency=feedback_episode_positive_frequency,
            amplitude=feedback_positive_amplitude,
            pulses=feedback_episode_positive_pulses
        )
    else:
        # Negative episode outcome
        send_feedback_command(
            type="event",
            channels=event_feedback_settings['took_damage'].channels,
            frequency=feedback_episode_negative_frequency,
            amplitude=feedback_negative_amplitude,
            pulses=feedback_episode_negative_pulses
        )

Configuration

Feedback Parameters

# From PPOConfig in training_server.py

# General feedback settings
use_reward_feedback: bool = True
use_episode_feedback: bool = True
episode_only_feedback: bool = False  # If True, skip step-level feedback
episode_feedback_surprise_scaling: bool = True

# Reward feedback thresholds
feedback_positive_threshold: float = 1.0
feedback_negative_threshold: float = -1.0

# Step-level reward feedback
feedback_positive_frequency: float = 20.0   # Hz
feedback_positive_amplitude: float = 2.0    # μA
feedback_positive_pulses: int = 30

feedback_negative_frequency: float = 60.0   # Hz
feedback_negative_amplitude: float = 2.0    # μA
feedback_negative_pulses: int = 90

# Episode-level feedback
feedback_episode_positive_frequency: float = 40.0
feedback_episode_positive_pulses: int = 80

feedback_episode_negative_frequency: float = 120.0
feedback_episode_negative_pulses: int = 160

# Surprise scaling
feedback_surprise_gain: float = 0.25
feedback_surprise_max_scale: float = 2.0
feedback_surprise_freq_gain: float = 0.65
feedback_surprise_amp_gain: float = 0.35

Start with conservative feedback parameters (low amplitude, low pulse counts) and gradually increase if neurons don’t respond. Excessive feedback can cause adaptation or desensitization.

Design Rationale

Why Separate Feedback Channels?

Avoid Confounding: Encoder-decoder loop learns from spike responses without reward information leaking in
Clear Attribution: Feedback channels explicitly signal reward, not game state
Biological Plausibility: Mimics reward pathways (dopamine, etc.) separate from sensory processing
Debugging: Can disable feedback without affecting encoder-decoder functionality

Why Surprise Scaling?

Learning Efficiency: Focus neural resources on unexpected events
Curriculum Learning: Automatic adjustment as value function improves
Biological Relevance: Mimics prediction error signals in animal brains
Sample Efficiency: Strong feedback when it matters most

Why Unpredictable Stimulation?

Aversion Learning: Irregular patterns harder to adapt to, maintaining discomfort
Safety Incentive: Encourages damage avoidance behaviors
Biological Realism: Pain responses in animals are persistent and irregular

Monitoring Feedback

The CL1 interface logs feedback commands:

# From cl1_neural_interface.py:454-455
if self.feedback_commands_received <= 5:
    print(f"[FEEDBACK] {feedback_type} on {len(channels)} channels: "
          f"{frequency}Hz, {amplitude}μA, {pulses} pulses ({event_name})")

Statistics:

Stats: 1000 ticks | 
       Recv: 10.0 pkt/s | 
       Send: 10.0 pkt/s | 
       Events: 15 | 
       Feedback: 42 | 
       Avg spikes: 12.34/tick

Events: Episode metadata logged
Feedback: Total feedback commands processed
Avg spikes: Overall neural activity level

Future Directions

Potential Enhancements

Adaptive Feedback Scaling

Automatically tune base_amplitude and base_frequency based on neural response
Detect and compensate for neural adaptation over time

Multi-Modal Feedback

Combine frequency/amplitude/pulse count scaling
Explore temporal patterns (bursts, ramps)

Channel-Specific Learning

Learn which channels are most effective for reward signaling
Adaptively allocate feedback across channel subsets

Closed-Loop Feedback

Adjust feedback based on decoder confidence
Reduce feedback when decoder is certain, increase when uncertain

Get Started

Core Concepts

Guides

Configuration

Advanced

Overview

Feedback Architecture

Feedback Channel Types

Reward Feedback Channels

Event Feedback Channels

Surprise Scaling

Temporal-Difference (TD) Error

Surprise-Scaled Parameters

Exponential Moving Average (EMA)

Unpredictable Stimulation

Damage Aversion Learning

Feedback Command Protocol

UDP Packet Format

Feedback Types

CL1 Feedback Application

Applying Feedback to Hardware

Feedback Timing

Step-Level Feedback

Event-Level Feedback

Episode-Level Feedback

Configuration

Feedback Parameters

Design Rationale

Why Separate Feedback Channels?

Why Surprise Scaling?

Why Unpredictable Stimulation?

Monitoring Feedback

Future Directions

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Configuration

Advanced

​Overview

​Feedback Architecture

​Feedback Channel Types

​Reward Feedback Channels

​Event Feedback Channels

​Surprise Scaling

​Temporal-Difference (TD) Error

​Surprise-Scaled Parameters

​Exponential Moving Average (EMA)

​Unpredictable Stimulation

​Damage Aversion Learning

​Feedback Command Protocol

​UDP Packet Format

​Feedback Types

​CL1 Feedback Application

​Applying Feedback to Hardware

​Feedback Timing

​Step-Level Feedback

​Event-Level Feedback

​Episode-Level Feedback

​Configuration

​Feedback Parameters

​Design Rationale

​Why Separate Feedback Channels?

​Why Surprise Scaling?

​Why Unpredictable Stimulation?

​Monitoring Feedback

​Future Directions

Build docs developers (and LLMs) love

Overview

Feedback Architecture

Feedback Channel Types

Reward Feedback Channels

Event Feedback Channels

Surprise Scaling

Temporal-Difference (TD) Error

Surprise-Scaled Parameters

Exponential Moving Average (EMA)

Unpredictable Stimulation

Damage Aversion Learning

Feedback Command Protocol

UDP Packet Format

Feedback Types

CL1 Feedback Application

Applying Feedback to Hardware

Feedback Timing

Step-Level Feedback

Event-Level Feedback

Episode-Level Feedback

Configuration

Feedback Parameters

Design Rationale

Why Separate Feedback Channels?

Why Surprise Scaling?

Why Unpredictable Stimulation?

Monitoring Feedback

Future Directions