Hot reload

Overview

Hot reload enables configuration changes to take effect without restarting pods. The client libraries automatically detect when values files are updated and reload them with validation, providing a seamless update experience. Changes propagate through the system in minutes, with full observability into timing and delays.

Reload lifecycle

The complete reload cycle from code commit to active values:

1. Update values.yaml in sentry-options-automator
   └─> Commit and merge to main

2. CI validates and generates ConfigMaps
   └─> sentry-options-cli write (generates values.json)

3. CD applies ConfigMaps to clusters
   └─> kubectl apply -f sentry-options-{namespace}.yaml

4. Kubelet syncs ConfigMap to pod volumes
   └─> ~1-2 minutes (kubelet sync period)
   └─> Updates /etc/sentry-options/values/{namespace}/values.json

5. Client library detects file change
   └─> ~5 seconds (polling interval)
   └─> Triggers reload and validation

6. New values active in application
   └─> Total latency: ~1-2 minutes

No pod restart is required. The application continues running with updated configuration.

File watching implementation

Polling strategy

The client libraries use polling instead of filesystem events (inotify/FSEvents):

Polling interval: 5 seconds (POLLING_DELAY constant in lib.rs:30)
Mechanism: Check modification time of all values.json files
Reload trigger: Any file mtime change triggers full reload

Why polling?

Works reliably with Kubernetes ConfigMaps (which don’t trigger inotify events)
Compatible with NFS and other virtual filesystems
Simple implementation with predictable behavior
5 seconds is acceptable overhead given ~1-2 min ConfigMap propagation

Watcher thread

The ValuesWatcher struct (lib.rs:483-672) runs a background thread:

pub struct ValuesWatcher {
    stop_signal: Arc<AtomicBool>,
    thread: Option<JoinHandle<()>>,
}

Lifecycle:

Created during Options::new() initialization
Spawns background thread named "sentry-options-watcher"
Thread polls every 5 seconds until stopped
Automatically stopped when ValuesWatcher is dropped

Thread safety:

Values stored in Arc<RwLock<ValuesByNamespace>>
Multiple readers can access values concurrently
Writer (reload thread) gets exclusive access during updates
If thread panics, it’s caught and logged (lib.rs:509-515)

Modification time tracking

The watcher tracks the most recent modification time across all namespace values.json files:

// From lib.rs:548-583 (simplified)
fn get_mtime(values_dir: &Path) -> Option<SystemTime> {
    let mut latest_mtime = None;
    
    for entry in fs::read_dir(values_dir)? {
        let values_file = entry.path().join("values.json");
        if let Ok(metadata) = fs::metadata(&values_file) {
            let mtime = metadata.modified()?;
            if latest_mtime.is_none() || mtime > latest_mtime {
                latest_mtime = Some(mtime);
            }
        }
    }
    
    latest_mtime
}

If any values.json file has a newer mtime than previously seen, all namespaces are reloaded.

Reload behavior

Atomic updates

When a change is detected:

Load all values from disk into new HashMap
Validate all namespaces against schemas
If all valid: Replace entire values map atomically
If any invalid: Keep old values entirely, log error

// From lib.rs:586-610 (simplified)
fn reload_values(
    values_path: &Path,
    registry: &SchemaRegistry,
    values: &Arc<RwLock<ValuesByNamespace>>,
) {
    match registry.load_values_json(values_path) {
        Ok((new_values, generated_at_by_namespace)) => {
            // Atomic update: acquire write lock and replace entire map
            let mut guard = values.write().unwrap();
            *guard = new_values;
            
            emit_reload_spans(...);
        }
        Err(e) => {
            // Keep old values on error
            eprintln!("Failed to reload values: {}", e);
        }
    }
}

If any namespace fails validation, all old values are kept. This prevents partial updates with inconsistent state.

Validation during reload

All values are re-validated against schemas on every reload:

Read all {namespace}/values.json files
Parse JSON and extract options object
Validate each namespace’s options against its schema
If validation passes for all namespaces, update values
If any validation fails, keep old values and log error

This ensures the application never sees invalid configuration, even if a bad ConfigMap is deployed.

Reader/writer synchronization

The values map uses RwLock for concurrent access:

Readers (application code calling opts.get()): Multiple concurrent reads allowed
Writer (reload thread): Exclusive write access blocks all readers
Poisoning handled: If lock is poisoned (writer panicked), readers still work (lib.rs:79)

Potential starvation: With a steady stream of readers, the writer may wait to acquire the lock. In practice, this is rare because reads are fast.

Propagation timing

ConfigMap propagation delay

Kubernetes takes time to sync ConfigMap updates to pods:

Kubelet sync period: Default ~1 minute (configurable)
Cache TTL: Additional delay from kubelet cache
Total typical delay: 1-2 minutes from kubectl apply to file update

This is a Kubernetes limitation, not specific to sentry-options.

Total end-to-end latency

ConfigMap update
  ↓  ~1-2 minutes (kubelet sync)
File modified on disk
  ↓  ~0-5 seconds (polling interval)
Change detected
  ↓  ~10-50 ms (reload + validation)
New values active

Total: ~1-2 minutes

Observability metrics

The reload process emits detailed timing metrics via Sentry transactions.

Observability

Sentry integration

The validation library includes a dedicated Sentry Hub for tracking reloads (lib.rs:42-63):

const SENTRY_OPTIONS_DSN: &str = 
    "https://[email protected]/4510750163927040";

static SENTRY_HUB: OnceLock<Arc<sentry::Hub>> = OnceLock::new();

Key points:

Completely isolated from host application’s Sentry setup
Uses separate DSN and client configuration
100% sample rate for all reload transactions
Disabled in tests (empty DSN)

Reload transactions

One transaction emitted per namespace on each reload (lib.rs:612-644):

fn emit_reload_spans(
    namespaces: &[String],
    reload_duration: Duration,
    generated_at_by_namespace: &HashMap<String, String>,
) {
    let hub = get_sentry_hub();
    let applied_at = Utc::now();
    
    for namespace in namespaces {
        let transaction = hub.start_transaction(
            TransactionContext::new(namespace, "sentry_options.reload")
        );
        
        transaction.set_data("reload_duration_ms", reload_duration_ms);
        transaction.set_data("applied_at", applied_at.to_rfc3339());
        transaction.set_data("generated_at", generated_at_timestamp);
        transaction.set_data("propagation_delay_secs", delay_secs);
        
        transaction.finish();
    }
}

Available metrics

Metric	Description	Source
`reload_duration_ms`	Time to load and validate values	Measured in reload function
`generated_at`	When ConfigMap was generated	From values.json metadata
`applied_at`	When application loaded values	Current timestamp
`propagation_delay_secs`	Time from generation to application	Calculated: `applied_at - generated_at`

Transaction name: Namespace (e.g., "seer", "relay") Transaction type: "sentry_options.reload" Sample rate: 100% (all reloads tracked)

Propagation delay tracking

The generated_at timestamp is embedded in values.json by the CLI:

{
  "options": { ... },
  "generated_at": "2024-01-21T18:30:00.123456+00:00"
}

The client library calculates the delay:

if let Ok(generated_time) = DateTime::parse_from_rfc3339(ts) {
    let delay_secs = (applied_at - generated_time.with_timezone(&Utc))
        .num_milliseconds() as f64 / 1000.0;
    transaction.set_data("propagation_delay_secs", delay_secs.into());
}

This reveals:

How long ConfigMaps took to propagate to pods
Bottlenecks in the deployment pipeline
Whether kubelet sync is slower than expected

Error handling

Missing values directory

If the values directory doesn’t exist:

Watcher creation: Logs warning but continues (lib.rs:496-498)
Polling: Returns None for mtime, no reload triggered
Application: Uses schema defaults for all options

Set SENTRY_OPTIONS_SUPPRESS_MISSING_DIR=1 to suppress warnings.

Validation failures

If reloaded values fail validation:

Error logged to stderr:

Failed to reload values from /etc/sentry-options/values: 
Value error for seer:
    feature.rate-limit "invalid" is not of type "integer"

Old values retained - application continues with previous configuration
No Sentry transaction emitted (only on successful reload)

This prevents bad ConfigMaps from breaking running applications.

Thread panics

If the watcher thread panics:

let result = panic::catch_unwind(AssertUnwindSafe(|| {
    Self::run(thread_signal, thread_path, thread_registry, thread_values);
}));
if let Err(e) = result {
    eprintln!("Watcher thread panicked with: {:?}", e);
}

Panic is caught and logged
Thread terminates
No new reloads occur
Application continues with last known values

There is no automatic restart mechanism for the watcher thread. If it panics and dies, the application continues running but won’t pick up new configuration until restarted.

Lock poisoning

If the reload thread panics while holding the write lock:

let values_guard = self.values.read()
    .unwrap_or_else(|poisoned| poisoned.into_inner());

Readers recover from poisoned locks and continue accessing values. This prevents a panic in the reload thread from breaking the entire application.

Testing hot reload

Local testing

To test hot reload locally:

Create test values:

mkdir -p sentry-options/values/test
cat > sentry-options/values/test/values.json << 'EOF'
{
  "options": {
    "feature.enabled": false
  }
}
EOF

Start your application:

from sentry_options import init, options
init()
opts = options('test')

while True:
    print(f"enabled: {opts.get('feature.enabled')}")
    time.sleep(1)

Modify values while running:

# In another terminal
cat > sentry-options/values/test/values.json << 'EOF'
{
  "options": {
    "feature.enabled": true
  }
}
EOF

Observe reload:
- Within 5 seconds, application prints enabled: True
- No restart required

Reload timing tests

The test suite includes timing tests (lib.rs:1319-1386):

#[test]
fn test_reload_values_updates_map() {
    // Create initial values
    let values = Arc::new(RwLock::new(initial_values));
    
    // Modify files on disk
    fs::write(values_dir.join("ns1/values.json"), new_values)?;
    
    // Force reload
    ValuesWatcher::reload_values(&values_dir, &registry, &values);
    
    // Verify new values active
    let guard = values.read().unwrap();
    assert_eq!(guard["ns1"]["enabled"], json!(true));
}

These tests verify:

Modification time detection
Atomic value updates
Old values persist on validation errors
Thread creation and termination

Best practices

Graceful value changes

When changing configuration:

Consider active requests: Values can change mid-request
Use circuit breakers: For critical flags that affect traffic routing
Monitor metrics: Watch Sentry transactions to confirm propagation
Test locally first: Verify reload works with your schema

Reload-safe code patterns

# ❌ BAD: Caching option value at module level
FEATURE_ENABLED = opts.get('feature.enabled')

def process_request():
    if FEATURE_ENABLED:  # Stale value, won't update on reload
        ...

# ✅ GOOD: Read option value when needed
def process_request():
    if opts.get('feature.enabled'):  # Fresh value on every call
        ...

Monitoring reload health

Query Sentry transactions to track reload behavior:

Propagation delays: Check propagation_delay_secs distribution
Reload frequency: Count transactions per namespace
Reload duration: Monitor reload_duration_ms for performance

Alert on:

Propagation delays > 5 minutes (indicates kubelet sync issues)
Reload duration > 1 second (indicates slow disk or large values)
Missing reload transactions (watcher thread may be dead)

Get Started

Integration Guide

Core Concepts

Guides

Overview

Reload lifecycle

File watching implementation

Polling strategy

Watcher thread

Modification time tracking

Reload behavior

Atomic updates

Validation during reload

Reader/writer synchronization

Propagation timing

ConfigMap propagation delay

Total end-to-end latency

Observability metrics

Observability

Sentry integration

Reload transactions

Available metrics

Propagation delay tracking

Error handling

Missing values directory

Validation failures

Thread panics

Lock poisoning

Testing hot reload

Local testing

Reload timing tests

Best practices

Graceful value changes

Reload-safe code patterns

Monitoring reload health

Build docs developers (and LLMs) love

Get Started

Integration Guide

Core Concepts

Guides

​Overview

​Reload lifecycle

​File watching implementation

​Polling strategy

​Watcher thread

​Modification time tracking

​Reload behavior

​Atomic updates

​Validation during reload

​Reader/writer synchronization

​Propagation timing

​ConfigMap propagation delay

​Total end-to-end latency

​Observability metrics

​Observability

​Sentry integration

​Reload transactions

​Available metrics

​Propagation delay tracking

​Error handling

​Missing values directory

​Validation failures

​Thread panics

​Lock poisoning

​Testing hot reload

​Local testing

​Reload timing tests

​Best practices

​Graceful value changes

​Reload-safe code patterns

​Monitoring reload health

Build docs developers (and LLMs) love

Overview

Reload lifecycle

File watching implementation

Polling strategy

Watcher thread

Modification time tracking

Reload behavior

Atomic updates

Validation during reload

Reader/writer synchronization

Propagation timing

ConfigMap propagation delay

Total end-to-end latency

Observability metrics

Observability

Sentry integration

Reload transactions

Available metrics

Propagation delay tracking

Error handling

Missing values directory

Validation failures

Thread panics

Lock poisoning

Testing hot reload

Local testing

Reload timing tests

Best practices

Graceful value changes

Reload-safe code patterns

Monitoring reload health