Skip to main content

Overview

TCP Streamer implements a sophisticated audio streaming pipeline that captures audio from input devices and transmits raw PCM data over TCP with minimal latency. The system uses a lock-free ring buffer architecture and precision timing to ensure smooth, consistent audio delivery.

Audio Pipeline Architecture

The streaming system consists of two independent threads:
Input DevicecpalProducerRing BufferConsumer (Thread) → TCP Stream

Producer Thread (Audio Capture)

  • Captures audio via the cpal library (cross-platform audio)
  • Converts all input formats (F32, I16, U16) to internal F32 format
  • Pushes audio frames to a lock-free ring buffer
  • Runs at the device’s native sample rate and buffer size

Consumer Thread (Network Transmission)

  • Reads audio chunks from the ring buffer
  • Converts F32 samples back to I16 PCM for transmission
  • Sends data over TCP with precision pacing
  • Monitors network quality and adjusts behavior dynamically

PCM Audio Format

TCP Streamer transmits audio in raw PCM format:
// Output Format (Network Edge)
Format: Raw PCM
Bit Depth: 16-bit signed integers (I16)
Byte Order: Little-endian
Channels: 2 (stereo)
Sample Rates: 44.1 kHz or 48 kHz

// Internal Processing Format
Internal Format: 32-bit floating point (F32)
Range: -1.0 to +1.0 (normalized)

Why F32 Internally?

As of version 1.8.0, TCP Streamer uses native F32 processing throughout the pipeline:
  • Eliminates clipping: F32 provides headroom beyond ±1.0 for intermediate calculations
  • Better precision: No quantization errors during processing
  • Linux compatibility: PipeWire and PulseAudio prefer F32 natively
  • Future-proof: Enables potential DSP features without format conversions
The system automatically detects the best input format (audio.rs:1023-1042):Priority Order:
  1. F32 (preferred - native on PipeWire/Linux)
  2. I16 (standard - WASAPI/CoreAudio)
  3. U16 (fallback)
Loopback devices on Windows may only support specific formats, so the system uses relaxed format matching.

Token Bucket Pacing Algorithm

To prevent network micro-bursts and ensure mathematically perfect timing, TCP Streamer uses a Strict Clock Strategy for transmission pacing.

How It Works

1

Calculate Tick Duration

Each audio chunk has a precise duration based on sample rate:
// audio.rs:594-596
let tick_duration = Duration::from_micros(
    (chunk_size as u64 * 1_000_000) / sample_rate as u64
);
let mut next_tick = Instant::now();
Example: At 48 kHz with 1024-sample chunks:
  • Tick duration = (1024 × 1,000,000) / 48,000 = 21,333 microseconds (21.3ms)
2

Sleep Until Next Tick

The network thread sleeps precisely until the next scheduled transmission:
// audio.rs:625-637
if now < next_tick {
    thread::sleep(next_tick - now);
    next_tick += tick_duration;
} else {
    // Massive lag detected (>200ms), reset clock
    if now.duration_since(next_tick) > Duration::from_millis(200) {
        next_tick = Instant::now() + tick_duration;
    } else {
        next_tick += tick_duration;
    }
}
3

Adaptive Drain Mode

If the buffer fills beyond the high water mark, the system enters drain mode:
// audio.rs:605-612
let high_water_mark = prefill_samples + (sample_rate as usize / 10);
if current_buffered > high_water_mark && current_stream.is_some() {
    // DRAIN MODE: Process immediately, reset next_tick
    next_tick = now + tick_duration;
}
This prevents buffer overflow while maintaining smooth playback.
Why Precision Matters: Without strict pacing, the network thread would send bursts of packets, causing jitter spikes and potential buffer underruns on the receiver side. The token bucket algorithm ensures consistent packet timing with sub-millisecond accuracy.

Prefill Gate (Startup Buffering)

To eliminate “cold start” stuttering, TCP Streamer implements a prefill gate that waits for the buffer to fill before transmission begins (v1.8.1).
// audio.rs:572-590
let prefill_samples = sample_rate as usize * 1; // 1000ms of audio

emit_log(&app_handle_net, "info",
    format!("Buffering... waiting for {} samples (1000ms)", prefill_samples)
);

while cons.len() < prefill_samples && is_running_clone.load(Ordering::Relaxed) {
    thread::sleep(Duration::from_millis(10));
}

emit_log(&app_handle_net, "success",
    "Buffer prefilled! Starting transmission.".to_string()
);

Configuration

ParameterValuePurpose
Prefill Duration1000msEnsures stable startup across all platforms
Check Interval10msPolling frequency for buffer level
PlatformsWindows, Linux, macOSWorks equally on all operating systems
Trade-off: The prefill gate adds ~1 second of startup latency, but this is essential for preventing audio glitches during the critical first moments of streaming. Once streaming, latency is determined by the ring buffer size (typically 2-8 seconds).

Connection Management

TCP Streamer uses advanced socket configuration for reliable streaming:

Socket Options

// audio.rs:706-734
let socket = Socket::new(Domain::IPV4, Type::STREAM, Some(Protocol::TCP))?;

// Large send buffer (1MB) to absorb OS scheduling jitter
socket.set_send_buffer_size(1024 * 1024)?;

// TCP Keepalive to detect dead connections
let keepalive = TcpKeepalive::new()
    .with_time(Duration::from_secs(5))
    .with_interval(Duration::from_secs(2));
socket.set_tcp_keepalive(&keepalive)?;

// DSCP/TOS for QoS tagging
let tos_value = match dscp_strategy.as_str() {
    "voip" => 0xB8,       // EF (Expedited Forwarding)
    "lowdelay" => 0x10,   // IPTOS_LOWDELAY
    "throughput" => 0x08, // IPTOS_THROUGHPUT
    "besteffort" => 0x00,
    _ => 0xB8,
};
socket.set_tos(tos_value)?;

// Disable Nagle's algorithm for low latency
stream.set_nodelay(true)?;

// Write timeout to detect hangs
stream.set_write_timeout(Some(Duration::from_secs(5)))?;

Graceful Shutdown

To prevent zombie connections, TCP Streamer explicitly sends TCP FIN packets:
// audio.rs:334-357
fn close_tcp_stream(stream: TcpStream, context: &str, app_handle: &AppHandle) {
    use std::net::Shutdown;
    
    if let Err(e) = stream.shutdown(Shutdown::Both) {
        emit_log(app_handle, "debug",
            format!("TCP shutdown {} ({}): socket may already be closed", context, e)
        );
    } else {
        emit_log(app_handle, "debug",
            format!("TCP connection closed gracefully ({})", context)
        );
    }
}

Auto-Reconnect Logic

When disconnected, TCP Streamer uses exponential backoff with jitter:
// audio.rs:537-552
let mut retry_delay = Duration::from_secs(2); // Minimum 2s to prevent storms
const MAX_RETRY_DELAY: Duration = Duration::from_secs(60);

// Add ±500ms jitter to prevent thundering herd
fn add_jitter(base: Duration) -> Duration {
    let jitter_ms = (SystemTime::now()
        .duration_since(SystemTime::UNIX_EPOCH)
        .unwrap_or_default()
        .subsec_nanos() % 1000) as i64 - 500;
    let ms = base.as_millis() as i64 + jitter_ms;
    Duration::from_millis(ms.max(2000) as u64)
}
Backoff Sequence: 2s → 4s → 8s → 16s → 32s → 60s (max)

Configuration Options

Sample Rates

44.1 kHz

  • Standard CD quality
  • 1,411.2 kbps bitrate (stereo, 16-bit)
  • Ideal for music playback

48 kHz

  • Professional audio standard
  • 1,536 kbps bitrate (stereo, 16-bit)
  • Recommended for modern systems

Buffer Sizes (Hardware Latency)

Buffer SizeLatency (48kHz)Use Case
256 samples5.3msUltra-low latency (may cause dropouts)
512 samples10.7msLow latency (balanced)
1024 samples21.3msStandard (recommended)
2048 samples42.7msHigh stability (WiFi/loaded systems)
WASAPI Loopback: On Windows loopback mode, TCP Streamer uses BufferSize::Default (audio.rs:1078-1082) because fixed buffer sizes often fail with loopback devices. The system relies on the larger ring buffer for stability instead.

Ring Buffer Duration

The ring buffer absorbs network jitter and provides latency tolerance:
// audio.rs:449-466
let adjusted_ring_buffer_duration_ms = if is_loopback {
    8000.max(ring_buffer_duration_ms)  // WASAPI: 8000ms default
} else {
    5000.max(ring_buffer_duration_ms)  // Standard: 5000ms default
};

let ring_buffer_size = (sample_rate as usize) * 2 
                     * (adjusted_ring_buffer_duration_ms as usize) / 1000;
Recommended Values:
  • Ethernet (wired): 2000ms
  • WiFi (standard): 4000-5000ms
  • WiFi (poor signal): 8000ms+
  • WASAPI Loopback: 8000ms (accounts for Windows timing variability)

Performance Characteristics

CPU Usage

Typical CPU Load:
  • Producer thread: <1% CPU (audio capture is hardware-accelerated)
  • Consumer thread: 1-3% CPU (depends on chunk size and sample rate)
  • Total: ~2-4% CPU on modern systems

Memory Usage

Ring buffer memory consumption:
Buffer Size (bytes) = sample_rate × 2 channels × 4 bytes/sample × duration_seconds

Examples:
- 2000ms @ 48kHz: 48,000 × 2 × 4 × 2 = 768 KB
- 8000ms @ 48kHz: 48,000 × 2 × 4 × 8 = 3.07 MB

Latency Breakdown

1

Capture Latency

Hardware buffer size (5-43ms depending on setting)
2

Ring Buffer Latency

Configured duration (2000-8000ms typical)
3

Network Transmission

Depends on network quality (typically <10ms on LAN)
4

Total End-to-End

2-8 seconds typical (dominated by ring buffer for stability)
Latency vs Stability Trade-off: TCP Streamer prioritizes stability over latency. The large ring buffer ensures dropout-free playback even on WiFi networks with occasional jitter spikes. For synchronized multi-room audio (e.g., Snapcast), this latency is acceptable and consistent across all clients.

Troubleshooting

Causes:
  • Ring buffer too small for network conditions
  • CPU throttling (especially on laptops)
  • Network congestion
Solutions:
  • Increase ring buffer duration to 8000ms or higher
  • Enable adaptive buffering (see Adaptive Buffering)
  • Use Ethernet instead of WiFi if possible
  • Enable high-priority thread option in Advanced settings
Causes:
  • Server not responding
  • Firewall blocking connection
  • Write timeout triggered (5s)
Solutions:
  • Enable auto-reconnect in Automation settings
  • Check server logs for errors
  • Verify firewall rules allow TCP on the specified port
  • Test with nc -l <port> to verify TCP connectivity
Causes:
  • Small chunk size (more frequent processing)
  • Low hardware buffer size (more audio callbacks)
Solutions:
  • Increase chunk size to 2048 or 4096 in Advanced tab
  • Increase hardware buffer size to 1024 or 2048
  • Disable high-priority thread if not needed

Silence Detection

Learn how RMS-based silence detection saves bandwidth

Adaptive Buffering

Automatic buffer sizing based on network jitter

Profiles

Save configurations for different streaming scenarios

Build docs developers (and LLMs) love