Overview
The deadly corridor curriculum provides a structured progression from basic survival skills to complex combat scenarios. Training on levels 1-4 builds fundamental movement and targeting policies, while level 5 serves as the ultimate benchmark.Available Scenarios
Deadly Corridor Curriculum
All configs usedeadly_corridor.wad:
| Config | Difficulty | Description |
|---|---|---|
deadly_corridor_1.cfg | Beginner | Minimal enemies, ample resources |
deadly_corridor_2.cfg | Easy | Slightly more enemies, tighter spacing |
deadly_corridor_3.cfg | Medium | Balanced challenge, requires dodging |
deadly_corridor_4.cfg | Hard | Dense enemy placement, ammo scarcity |
deadly_corridor_5.cfg | Benchmark | Extreme difficulty, official test level |
Other Scenarios
Progressive Deathmatch (Default)- Similar to survival, but kills don’t reset ammo count
- Encourages proper ammo management
- Movement tweaks make training easier
- Uses
progressive_deathmatch.wad
- Classic survival mode
- Uses
survival.wad
Curriculum Design Principles
From README.md (lines 22-23):Filesdeadly_corridor_1.cfgtodeadly_corridor_4.cfgramp difficulty gradually, butdeadly_corridor_5.cfgis a significant jump (and the actual benchmark). Progress through 1-4 builds basic policies yet may result in movement habits that underperform on 5 (straight running toward armor). Adjust curriculum pacing accordingly.
Why Progressive Training Matters
Starting directly on level 5 often results in:- Random exploration with minimal reward signal
- High variance in policy gradients
- Slow or failed convergence
- Neurons receiving noisy, uninformative feedback
- Gradual skill acquisition (movement → targeting → tactics)
- Stronger reward signals early in training
- More stable policy updates
- Conditioned neurons with meaningful stimulus-response mappings
Configuration for Deadly Corridor
Architecture & Feedback Tuning
From README.md lines 25-41:PPO Hyperparameters
From README.md lines 19-20:Training Progression
Stage 1: Basic Movement (Level 1)
- Agent consistently moves forward
- Picks up armor/health
- Survival time > 30 seconds
Stage 2: Targeting (Levels 2-3)
- Agent turns toward enemies
- Kill count increasing
- Dodges incoming fire
Stage 3: Tactics (Level 4)
- Strategic positioning
- Ammo conservation
- Multi-enemy engagement
Stage 4: Fine-Tuning (Level 5)
Consider fine-tuning on deadly_corridor_5.cfg with a lower learning rate to adapt movement behavior.
Monitoring Curriculum Progress
TensorBoard Metrics
| Metric | Level 1 Target | Level 2-3 Target | Level 4 Target | Level 5 Target |
|---|---|---|---|---|
| Episode Reward | > 100 | > 300 | > 500 | > 800 |
| Kill Count | 1-2 | 3-5 | 5-8 | 8+ |
| Survival Time | 30s | 45s | 60s | 90s+ |
| Ammo Waste | High | Medium | Low | Minimal |
Transition Criteria
Move to the next level when:- Reward plateau: No improvement for 100 episodes
- Consistency: 80% of episodes achieve target metrics
- Skill demonstration: Agent exhibits desired behaviors (recorded gameplay)
Checkpoint Management
Saving Checkpoints
Checkpoints are automatically saved every 100 episodes (configurable):Loading Between Stages
Common Curriculum Issues
Agent learns bad habits on early levels
Agent learns bad habits on early levels
Symptom: Works well on level 1-3, fails catastrophically on level 5.Causes:
- Over-optimization on easy levels (e.g., always running straight)
- Insufficient exploration on harder levels
- Reduce
steps_per_updateon level 5 for more frequent updates - Increase
entropy_coeftemporarily to encourage exploration - Lower learning rate to prevent catastrophic forgetting
No learning on level 1
No learning on level 1
Symptom: Reward stays flat even on easiest level.Causes:
- Neurons not responding to stimulation
- Feedback channels misconfigured
- Ablation mode accidentally enabled
- Check
decoder_ablation_mode='none' - Verify spike counts > 0 (TensorBoard:
Spikes/total_count) - Inspect feedback amplitude/frequency in logs
- Test with
--show_windowto observe behavior
Training unstable when transitioning levels
Training unstable when transitioning levels
Symptom: Large reward variance when loading checkpoint on new level.Causes:
- Learning rate too high for new scenario
- Value network hasn’t adapted to new reward distribution
- Always reduce learning rate 2-3x when changing levels
- Use
normalize_returns=Truefor value stability - Run 50-100 episodes on new level before judging performance
Action Space Considerations
Hybrid vs. Discrete Actions
From README.md line 21:
Hybrid action spaces are used (and greatly preferred) unless use_discrete_action_set=True. Realistically, you only flip this if all else fails to reduce entropy as it greatly reduces the movement fidelity of the agent and just doesn’t look as cool.
Visualizing Training
From USAGE.md lines 35-39:visualisation.html in a browser and update the IP to your training server.