Pipeline Overview
The audio pipeline transforms platform-specific audio formats into network-ready PCM data:Stage 1: Audio Capture (cpal)
Device Selection
Device enumeration (audio.rs:1209-1300) scans all available host APIs:
- Linux: Excludes
default,sysdefault,null,hw:(unreliable) - Windows: Output devices available as loopback sources
- macOS: All input devices, requires permissions
Format Detection
The pipeline dynamically detects the best supported format (audio.rs:1015-1092):
- Native format for PipeWire (Linux)
- Eliminates clipping on Linux (v1.8.0 improvement)
- Internal processing uses F32 (no conversion needed)
- Higher precision for RMS calculations
Stream Configuration
- Smaller = Lower latency, higher CPU usage
- Larger = Higher latency, more stable
- Loopback uses
Default(typically 10ms on Windows)
Stage 2: Format Normalization (Producer)
AudioProcessor Design
TheAudioProcessor struct (audio.rs:1108-1142) handles format-agnostic processing:
Format-Specific Conversion
F32 Input (Passthrough)
I16 Input (Normalization)
U16 Input (Unsigned to Signed)
Overflow Handling
When the ring buffer is full (audio.rs:1125-1138):
push_slice()returns number of samples actually written- If
pushed < data.len(), overflow detected - Rate-limited warning emitted (max 1 per 5 seconds)
- Oldest samples are implicitly dropped (new data has priority)
Stage 3: Ring Buffer (Lock-Free FIFO)
See Buffering System for detailed analysis. Key points:- Type:
HeapRb<f32> - Size: Adaptive (2000-15000ms worth of samples)
- Thread-safe without mutexes
- Producer and consumer operate independently
Stage 4: Prefill Gate (v1.8.1)
Before transmission begins, the system waits for the buffer to fill (audio.rs:569-591):
- Absorbs initial network handshake latency
- Prevents “cold start” dropouts
- Small enough to avoid noticeable delay
- Works uniformly across Windows, Linux, macOS
Stage 5: Network Transmission (Consumer)
Strict Pacing Algorithm
The consumer thread uses a tick-based pacer (audio.rs:593-638):
- Normal: Sleeps until next scheduled tick
- High buffer: Drains immediately to prevent overflow
- Massive lag: Resets clock to prevent drift accumulation
F32 → I16 LE Conversion
At the network edge, F32 samples are converted to 16-bit little-endian PCM (audio.rs:772-779):
- Sample Size: 16-bit signed integer
- Byte Order: Little-endian
- Channels: 2 (interleaved L-R-L-R…)
- Sample Rate: 44100 or 48000 Hz
- Bitrate:
sample_rate * 2 channels * 16 bits = ~1.5 Mbps @ 48kHz
RMS Calculation (Removed in v1.6.0+)
Previous versions calculated RMS for silence detection in the audio thread. This has been removed to simplify the pipeline. The current implementation focuses on reliable transmission without silence detection in the producer.Error Handling
Stream Errors
The error callback (audio.rs:980-990) logs cpal errors:
- Device disconnected
- Sample rate change
- Buffer underrun (cpal level)
- Permission denied (microphone access)
Recovery Strategy
- Graceful degradation: Send silence on buffer underrun
- No auto-restart: Frontend must manually restart stream
- Network errors: Handled separately by consumer thread
Performance Optimization
Why Lock-Free Matters
- No priority inversion
- Deterministic latency
- No syscalls in audio thread
- Wait-free for producer
Memory Layout
CPU Impact
- Audio thread: ~1-2% (format conversion + ring write)
- Network thread: ~2-3% (ring read + TCP write + stats)
- Total: <5% on modern CPUs
Future Enhancements
- Codec support (Opus, FLAC) for bandwidth reduction
- Adaptive sample rate switching
- Multi-channel support (5.1, 7.1)
- Hardware-accelerated resampling