Overview
The backfill system ensures Junkie has access to historical messages for every channel. It automatically fetches and caches message history when the bot starts, enabling context-aware conversations even after restarts.Backfill Strategy
The system uses a two-phase approach:- Catch-up Phase: Fetch newer messages (after latest cached message)
- Backfill Phase: Fetch older messages (before oldest cached message)
Core Function
Thebackfill_channel function orchestrates the entire backfill process:
discord_bot/backfill.py:14-33
Thread Safety
Per-channel locks prevent race conditions during concurrent backfill:discord_bot/backfill.py:11-25
Phase 1: Catch-Up
The catch-up phase fetches messages newer than the latest cached message:discord_bot/backfill.py:43-56
Why Catch Up First?
- Ensures recent messages are available immediately
- More important for active conversations
- Newer messages have higher chance of being relevant
Phase 2: Initial Fetch
If no data exists, perform a full initial fetch:discord_bot/backfill.py:57-69
Phase 3: Deepen (Iterative Backfill)
The deepening phase iteratively fetches older messages until the target is reached:discord_bot/backfill.py:71-91
Iterative Strategy
The system fetches in batches to handle large histories:discord_bot/backfill.py:94-112
Completion Detection
The system detects when no more history exists:discord_bot/backfill.py:119-130
Progress Tracking
discord_bot/backfill.py:133-143
Multi-Channel Backfill
Thestart_backfill_task function coordinates backfill across all channels:
discord_bot/backfill.py:156-165
Concurrency Control
Semaphores limit concurrent operations to respect Discord rate limits:discord_bot/backfill.py:167-182
Summary Reporting
discord_bot/backfill.py:184-189
Configuration
Environment Variables
Key Parameters
- BACKFILL_CONCURRENCY: Number of channels to backfill simultaneously
- BACKFILL_MAX_ITERATIONS: Maximum deepen loops per channel
- BACKFILL_PARALLEL_BATCHES: Planned parallel batch support (currently sequential)
- BACKFILL_MAX_FETCH_LIMIT: Safety cap on fetch size
Integration with Bot Startup
Backfill runs automatically when the bot starts:discord_bot/chat_handler.py:63-95
Performance Optimizations
Skip Fully Cached Channels
Channels with sufficient messages are skipped:discord_bot/backfill.py:31-33
Rate Limit Protection
discord_bot/backfill.py:142-143
Graceful Failure Handling
discord_bot/backfill.py:182
Logging
The backfill system provides detailed logging:- Channel selection:
Found {n} channels to backfill - Catch-up phase:
↑ Catching up {channel} (after ID {id}) - Initial fetch:
⚡ No existing data for {channel}. Performing initial fetch... - Deepen phase:
↓ {channel} iteration {n}: {current}/{target} (need {needed} more) - Progress:
✓ Progress: {current}/{target} ({pct}%) - Completion:
✓ Completed {channel}: {count}/{target} ({pct}%) - Fetched {n} messages - Summary:
Summary: {successes}/{total} channels successful, {errors} failed
Best Practices
-
Target Limit: Set
CONTEXT_AGENT_MAX_MESSAGESto match your context needs- Too low: Limited conversation context
- Too high: Slower backfill, higher database usage
-
Concurrency: Adjust
BACKFILL_CONCURRENCYbased on bot load- Higher values: Faster backfill
- Lower values: Better rate limit compliance
-
Monitoring: Watch backfill logs on startup to detect issues
- Look for repeated failures on specific channels
- Check for rate limit errors
-
Database Performance: Ensure PostgreSQL is optimized
- Index on
channel_idandmessage_id - Regular VACUUM operations
- Index on