Debugging - Unmute

Overview

Unmute includes several debugging tools to help diagnose issues with audio quality, latency, service connectivity, and system behavior. This guide covers both development and production debugging techniques.

Development Mode

Enabling Dev Mode

Unmute includes a hidden debug view for development. Enable it by modifying frontend/src/app/useKeyboardShortcuts.ts:

const ALLOW_DEV_MODE = true;  // Change from false to true

Restart the frontend, then press D to toggle the debug view.

Debug Information Available

The debug view displays real-time information from self.debug_dict in unmute_handler.py. You can add custom debug data:

class UnmuteHandler:
    def __init__(self):
        self.debug_dict = {}
    
    async def some_method(self):
        # Add debug information
        self.debug_dict["stt_latency"] = latency_ms
        self.debug_dict["tts_queue_size"] = queue.qsize()
        self.debug_dict["llm_tokens"] = total_tokens
        
        # Emit to frontend
        await self.output_queue.put(
            AdditionalOutputs(debug_dict=self.debug_dict)
        )

Subtitles Mode

Press S to enable subtitles showing:

User speech transcription (from STT)
Bot responses (text and audio)

Useful for verifying STT accuracy and response content.

Logging

Backend Logging

Unmute uses Python’s standard logging with structured output:

from logging import getLogger

logger = getLogger(__name__)

# Different log levels
logger.debug("Detailed diagnostic information")
logger.info("General informational messages")
logger.warning("Warning messages for unexpected behavior")
logger.error("Error messages for recoverable errors")
logger.critical("Critical errors requiring immediate attention")

View logs:

# Docker Compose
docker compose logs -f backend
docker compose logs -f tts
docker compose logs -f stt
docker compose logs -f llm

# Docker Swarm
docker service logs -f llm-wrapper_backend

Log Configuration

From loadtest_client.py example:

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s %(process)d %(name)s %(levelname)s: %(message)s",
    datefmt="%Y-%m-%d %H:%M:%S",
)

Change level=logging.DEBUG for verbose output.

Service-Specific Logs

TTS and STT services write logs to dedicated volumes:

tts:
  volumes:
    - ./volumes/tts-logs:/tmp/unmute_logs

stt:
  volumes:
    - ./volumes/stt-logs:/tmp/unmute_logs

Access logs on the host machine:

tail -f ./volumes/tts-logs/*.log

Recording Sessions

Enabling Recordings

Set the KYUTAI_RECORDINGS_DIR environment variable:

backend:
  environment:
    - KYUTAI_RECORDINGS_DIR=/recordings
  volumes:
    - recordings:/recordings

Unmute records:

User audio input
Bot audio output
Session metadata

Access recordings:

ls -lh ./volumes/recordings/

Recording Format

Recordings use the Recorder class (from unmute_handler.py):

from unmute.recorder import Recorder

self.recorder = Recorder(RECORDINGS_DIR) if RECORDINGS_DIR else None

Audio is saved in WAV format at 24kHz sample rate.

Load Testing

Running Load Tests

The loadtest_client.py script simulates realistic user conversations:

uv run unmute/loadtest/loadtest_client.py \
  --server-url ws://localhost:8000 \
  --n-workers 16 \
  --n-conversations 100 \
  --audio-dir ./unmute/loadtest/voices

Parameters:

--n-workers: Parallel connections (simulates concurrent users)
--n-conversations: Total conversations to run
--audio-dir: Directory with test audio files (MP3 format)
--listen: Play received audio for manual verification

Interpreting Results

Load test output includes detailed timing:

{
  "stt_latencies": {
    "count": 523,
    "mean": 0.034,
    "median": 0.031,
    "p90": 0.048,
    "p95": 0.056
  },
  "vad_latencies": {...},
  "llm_latencies": {...},
  "tts_start_latencies": {...},
  "tts_realtime_factors": {...}
}

Key metrics:

Mean/Median: Average performance
p90/p95: Tail latencies (worst-case scenarios)
Realtime factor: Less than 1.0 means TTS generates faster than playback (good)
OK fraction: Success rate (should be >0.95)

Debugging with Load Tests

# Test with verbose output and audio playback
uv run unmute/loadtest/loadtest_client.py \
  --server-url ws://localhost:8000 \
  --n-workers 1 \
  --listen

This plays back audio and shows detailed logs for manual inspection.

Health Checks

Backend Health Endpoint

curl http://localhost:8000/v1/health

Healthy response:

{
  "ok": true
}

Unhealthy response:

{
  "ok": false,
  "error": "STT service unavailable"
}

Service Discovery

Unmute uses service discovery to find available STT/TTS instances (from service_discovery.py):

from unmute.service_discovery import find_instance

stt_url = await find_instance("stt", base_url="ws://tasks.stt:8080")

Debug service discovery:

# Check Docker Swarm service discovery
docker exec <backend_container> nslookup tasks.tts

# Should return multiple IPs if replicas exist

Common Issues

Issue: STT Not Transcribing

Symptoms: No subtitles appear, silence timeout triggers Debug steps:

Check STT service logs: docker logs <stt_container>
Verify microphone permissions in browser
Enable subtitles (press S) to see if any text appears
Check STT metrics: worker_stt_recv_words_total (should increase)
Test with known audio file using load test

Common causes:

Microphone not connected/permitted
Echo cancellation consuming speech
STT service out of memory

Issue: High Latency

Symptoms: Delayed responses, choppy audio Debug steps:

Check Grafana dashboards for latency spikes
Monitor GPU utilization: nvidia-smi -l 1

Check service metrics:

curl http://localhost:8000/metrics | grep ttft

Run load test to isolate bottleneck
Review self.debug_dict in dev mode

Common causes:

GPU shared between services (use multi-GPU setup)
High context length (--max-model-len too large)
Network latency between services
Insufficient GPU memory causing swapping

Issue: TTS Audio Choppy

Symptoms: Audio stutters or drops frames Debug steps:

Check output frame size in unmute_handler.py:

output_frame_size=480  # IMPORTANT! Higher values cause choppy audio

Monitor TTS realtime factor:
```
rate(worker_tts_gen_duration_sum) / rate(worker_tts_audio_duration_sum)
```
Should be less than 1.0 (faster than realtime)
Check TTS service logs for errors
Verify GPU not overloaded

Common causes:

Output frame size too large
TTS generation slower than realtime
Network congestion
CPU throttling

Issue: Service Connection Failures

Symptoms: worker_stt_misses or worker_tts_misses increasing Debug steps:

Check service health:

docker ps  # All services should be "Up"

Test connectivity from backend:

docker exec <backend> curl http://stt:8080/health

Check service discovery:

docker exec <backend> nslookup tasks.stt

Review service logs for crashes

Common causes:

Service crashed and restarting
GPU out of memory
Network misconfiguration
Too many concurrent requests

Issue: LLM Timeouts

Symptoms: worker_vllm_hard_errors increasing, responses cut off Debug steps:

Check LLM service logs:

docker logs <llm_container> | grep -i error

Monitor GPU memory:
```
nvidia-smi
```

Check request context length:

histogram_quantile(0.95, rate(worker_vllm_request_length_bucket[5m]))

Review LLM configuration in docker-compose.yml

Common causes:

Context window too large for GPU memory
--gpu-memory-utilization too high
Long conversation history exceeding --max-model-len
Model loading failure

Debugging Tools

1. Audio Debugging

From loadtest_client.py:

def preview_audio(audio: np.ndarray, playback_speed: float = 1.0):
    """Play audio for manual verification"""
    audio = audio_to_float32(audio)
    if playback_speed != 1.0:
        audio = librosa.effects.time_stretch(audio, rate=playback_speed)
    audio_segment = pydub.AudioSegment(
        data=audio_to_int16(audio),
        sample_width=2,
        frame_rate=SAMPLE_RATE,
        channels=1,
    )
    pydub.playback.play(audio_segment)

Use this to verify audio quality at each pipeline stage.

2. Timing Analysis

Unmute uses PhasesStopwatch for detailed timing:

from unmute.timer import PhasesStopwatch

stopwatch = PhasesStopwatch(["response_created", "text_start", "audio_start", "audio_end"])

# Mark phase transitions
stopwatch.time_phase_if_not_started("response_created")
# ... processing ...
stopwatch.time_phase_if_not_started("text_start")
# ... more processing ...
stopwatch.time_phase_if_not_started("audio_start")
# ... final processing ...
stopwatch.time_phase_if_not_started("audio_end")

# Get timing report
phases = stopwatch.phase_dict()
print(f"Total latency: {phases['audio_end'] - phases['response_created']}s")

Add custom stopwatches to measure specific operations.

3. WebSocket Message Inspection

Log all WebSocket messages:

import logging

emit_logger = logging.getLogger("emit")
receive_logger = logging.getLogger("receive")

emit_logger.setLevel(logging.DEBUG)
receive_logger.setLevel(logging.DEBUG)

Inspect message payloads to debug protocol issues.

4. Metrics Endpoint

Query Prometheus metrics directly:

# Get all metrics
curl http://localhost:8000/metrics

# Filter specific metrics
curl http://localhost:8000/metrics | grep worker_stt

# Get current active sessions
curl -s http://localhost:8000/metrics | grep worker_active_sessions | grep -v '#'

Debug Environment Variables

From unmute_handler.py:

# Enable TTS text debugging (bypass STT/LLM)
TTS_DEBUGGING_TEXT = "What's 'Hello world'?"

# Override audio input with file (bypass microphone)
AUDIO_INPUT_OVERRIDE = Path.home() / "audio/test.mp3"

# User silence timeout
USER_SILENCE_TIMEOUT = 7.0  # Increase for debugging

# VAD interrupt delay
UNINTERRUPTIBLE_BY_VAD_TIME_SEC = 3  # Increase to prevent accidental interrupts

Uncomment and modify these in the source code for local debugging.

Debugging Production

Docker Swarm Debugging

# List all services
docker service ls

# Check service status
docker service ps llm-wrapper_backend --no-trunc

# View service logs
docker service logs -f llm-wrapper_tts

# Inspect service configuration
docker service inspect llm-wrapper_llm

# Access running container
docker exec -it $(docker ps -q -f name=llm-wrapper_backend) /bin/bash

Scaling for Debugging

# Scale down to single replica for easier debugging
docker service scale llm-wrapper_backend=1

# Scale back up after debugging
docker service scale llm-wrapper_backend=16

Force Service Restart

# Force restart without changing config
docker service update --force llm-wrapper_tts

Additional Resources

Monitoring - Set up metrics collection
Performance Tuning - Optimize based on debug findings
Multi-GPU Setup - Debug GPU allocation issues

Getting Help

If you encounter issues:

Check existing GitHub issues
Review logs and metrics before reporting
Include reproducible steps and error messages
Share relevant configuration (docker-compose.yml, etc.)

Note: From the README: “If something isn’t working for you, don’t hesitate to open an issue. We’ll do our best to help you figure out what’s wrong.”

Customization

Advanced

​Overview

​Development Mode

​Enabling Dev Mode

​Debug Information Available

​Subtitles Mode

​Logging

​Backend Logging

​Log Configuration

​Service-Specific Logs

​Recording Sessions

​Enabling Recordings

​Recording Format

​Load Testing

​Running Load Tests

​Interpreting Results

​Debugging with Load Tests

​Health Checks

​Backend Health Endpoint

​Service Discovery

​Common Issues

​Issue: STT Not Transcribing

​Issue: High Latency

​Issue: TTS Audio Choppy

​Issue: Service Connection Failures

​Issue: LLM Timeouts

​Debugging Tools

​1. Audio Debugging

​2. Timing Analysis

​3. WebSocket Message Inspection

​4. Metrics Endpoint

​Debug Environment Variables

​Debugging Production

​Docker Swarm Debugging

​Scaling for Debugging

​Force Service Restart

​Additional Resources

​Getting Help

Build docs developers (and LLMs) love

Overview

Development Mode

Enabling Dev Mode

Debug Information Available

Subtitles Mode

Logging

Backend Logging

Log Configuration

Service-Specific Logs

Recording Sessions

Enabling Recordings

Recording Format

Load Testing

Running Load Tests

Interpreting Results

Debugging with Load Tests

Health Checks

Backend Health Endpoint

Service Discovery

Common Issues

Issue: STT Not Transcribing

Issue: High Latency

Issue: TTS Audio Choppy

Issue: Service Connection Failures

Issue: LLM Timeouts

Debugging Tools

1. Audio Debugging

2. Timing Analysis

3. WebSocket Message Inspection

4. Metrics Endpoint

Debug Environment Variables

Debugging Production

Docker Swarm Debugging

Scaling for Debugging

Force Service Restart

Additional Resources

Getting Help