Skip to main content

Overview

Unmute includes several debugging tools to help diagnose issues with audio quality, latency, service connectivity, and system behavior. This guide covers both development and production debugging techniques.

Development Mode

Enabling Dev Mode

Unmute includes a hidden debug view for development. Enable it by modifying frontend/src/app/useKeyboardShortcuts.ts:
const ALLOW_DEV_MODE = true;  // Change from false to true
Restart the frontend, then press D to toggle the debug view.

Debug Information Available

The debug view displays real-time information from self.debug_dict in unmute_handler.py. You can add custom debug data:
class UnmuteHandler:
    def __init__(self):
        self.debug_dict = {}
    
    async def some_method(self):
        # Add debug information
        self.debug_dict["stt_latency"] = latency_ms
        self.debug_dict["tts_queue_size"] = queue.qsize()
        self.debug_dict["llm_tokens"] = total_tokens
        
        # Emit to frontend
        await self.output_queue.put(
            AdditionalOutputs(debug_dict=self.debug_dict)
        )

Subtitles Mode

Press S to enable subtitles showing:
  • User speech transcription (from STT)
  • Bot responses (text and audio)
Useful for verifying STT accuracy and response content.

Logging

Backend Logging

Unmute uses Python’s standard logging with structured output:
from logging import getLogger

logger = getLogger(__name__)

# Different log levels
logger.debug("Detailed diagnostic information")
logger.info("General informational messages")
logger.warning("Warning messages for unexpected behavior")
logger.error("Error messages for recoverable errors")
logger.critical("Critical errors requiring immediate attention")
View logs:
# Docker Compose
docker compose logs -f backend
docker compose logs -f tts
docker compose logs -f stt
docker compose logs -f llm

# Docker Swarm
docker service logs -f llm-wrapper_backend

Log Configuration

From loadtest_client.py example:
logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s %(process)d %(name)s %(levelname)s: %(message)s",
    datefmt="%Y-%m-%d %H:%M:%S",
)
Change level=logging.DEBUG for verbose output.

Service-Specific Logs

TTS and STT services write logs to dedicated volumes:
tts:
  volumes:
    - ./volumes/tts-logs:/tmp/unmute_logs

stt:
  volumes:
    - ./volumes/stt-logs:/tmp/unmute_logs
Access logs on the host machine:
tail -f ./volumes/tts-logs/*.log

Recording Sessions

Enabling Recordings

Set the KYUTAI_RECORDINGS_DIR environment variable:
backend:
  environment:
    - KYUTAI_RECORDINGS_DIR=/recordings
  volumes:
    - recordings:/recordings
Unmute records:
  • User audio input
  • Bot audio output
  • Session metadata
Access recordings:
ls -lh ./volumes/recordings/

Recording Format

Recordings use the Recorder class (from unmute_handler.py):
from unmute.recorder import Recorder

self.recorder = Recorder(RECORDINGS_DIR) if RECORDINGS_DIR else None
Audio is saved in WAV format at 24kHz sample rate.

Load Testing

Running Load Tests

The loadtest_client.py script simulates realistic user conversations:
uv run unmute/loadtest/loadtest_client.py \
  --server-url ws://localhost:8000 \
  --n-workers 16 \
  --n-conversations 100 \
  --audio-dir ./unmute/loadtest/voices
Parameters:
  • --n-workers: Parallel connections (simulates concurrent users)
  • --n-conversations: Total conversations to run
  • --audio-dir: Directory with test audio files (MP3 format)
  • --listen: Play received audio for manual verification

Interpreting Results

Load test output includes detailed timing:
{
  "stt_latencies": {
    "count": 523,
    "mean": 0.034,
    "median": 0.031,
    "p90": 0.048,
    "p95": 0.056
  },
  "vad_latencies": {...},
  "llm_latencies": {...},
  "tts_start_latencies": {...},
  "tts_realtime_factors": {...}
}
Key metrics:
  • Mean/Median: Average performance
  • p90/p95: Tail latencies (worst-case scenarios)
  • Realtime factor: Less than 1.0 means TTS generates faster than playback (good)
  • OK fraction: Success rate (should be >0.95)

Debugging with Load Tests

# Test with verbose output and audio playback
uv run unmute/loadtest/loadtest_client.py \
  --server-url ws://localhost:8000 \
  --n-workers 1 \
  --listen
This plays back audio and shows detailed logs for manual inspection.

Health Checks

Backend Health Endpoint

curl http://localhost:8000/v1/health
Healthy response:
{
  "ok": true
}
Unhealthy response:
{
  "ok": false,
  "error": "STT service unavailable"
}

Service Discovery

Unmute uses service discovery to find available STT/TTS instances (from service_discovery.py):
from unmute.service_discovery import find_instance

stt_url = await find_instance("stt", base_url="ws://tasks.stt:8080")
Debug service discovery:
# Check Docker Swarm service discovery
docker exec <backend_container> nslookup tasks.tts

# Should return multiple IPs if replicas exist

Common Issues

Issue: STT Not Transcribing

Symptoms: No subtitles appear, silence timeout triggers Debug steps:
  1. Check STT service logs: docker logs <stt_container>
  2. Verify microphone permissions in browser
  3. Enable subtitles (press S) to see if any text appears
  4. Check STT metrics: worker_stt_recv_words_total (should increase)
  5. Test with known audio file using load test
Common causes:
  • Microphone not connected/permitted
  • Echo cancellation consuming speech
  • STT service out of memory

Issue: High Latency

Symptoms: Delayed responses, choppy audio Debug steps:
  1. Check Grafana dashboards for latency spikes
  2. Monitor GPU utilization: nvidia-smi -l 1
  3. Check service metrics:
    curl http://localhost:8000/metrics | grep ttft
    
  4. Run load test to isolate bottleneck
  5. Review self.debug_dict in dev mode
Common causes:
  • GPU shared between services (use multi-GPU setup)
  • High context length (--max-model-len too large)
  • Network latency between services
  • Insufficient GPU memory causing swapping

Issue: TTS Audio Choppy

Symptoms: Audio stutters or drops frames Debug steps:
  1. Check output frame size in unmute_handler.py:
    output_frame_size=480  # IMPORTANT! Higher values cause choppy audio
    
  2. Monitor TTS realtime factor:
    rate(worker_tts_gen_duration_sum) / rate(worker_tts_audio_duration_sum)
    
    Should be less than 1.0 (faster than realtime)
  3. Check TTS service logs for errors
  4. Verify GPU not overloaded
Common causes:
  • Output frame size too large
  • TTS generation slower than realtime
  • Network congestion
  • CPU throttling

Issue: Service Connection Failures

Symptoms: worker_stt_misses or worker_tts_misses increasing Debug steps:
  1. Check service health:
    docker ps  # All services should be "Up"
    
  2. Test connectivity from backend:
    docker exec <backend> curl http://stt:8080/health
    
  3. Check service discovery:
    docker exec <backend> nslookup tasks.stt
    
  4. Review service logs for crashes
Common causes:
  • Service crashed and restarting
  • GPU out of memory
  • Network misconfiguration
  • Too many concurrent requests

Issue: LLM Timeouts

Symptoms: worker_vllm_hard_errors increasing, responses cut off Debug steps:
  1. Check LLM service logs:
    docker logs <llm_container> | grep -i error
    
  2. Monitor GPU memory:
    nvidia-smi
    
  3. Check request context length:
    histogram_quantile(0.95, rate(worker_vllm_request_length_bucket[5m]))
    
  4. Review LLM configuration in docker-compose.yml
Common causes:
  • Context window too large for GPU memory
  • --gpu-memory-utilization too high
  • Long conversation history exceeding --max-model-len
  • Model loading failure

Debugging Tools

1. Audio Debugging

From loadtest_client.py:
def preview_audio(audio: np.ndarray, playback_speed: float = 1.0):
    """Play audio for manual verification"""
    audio = audio_to_float32(audio)
    if playback_speed != 1.0:
        audio = librosa.effects.time_stretch(audio, rate=playback_speed)
    audio_segment = pydub.AudioSegment(
        data=audio_to_int16(audio),
        sample_width=2,
        frame_rate=SAMPLE_RATE,
        channels=1,
    )
    pydub.playback.play(audio_segment)
Use this to verify audio quality at each pipeline stage.

2. Timing Analysis

Unmute uses PhasesStopwatch for detailed timing:
from unmute.timer import PhasesStopwatch

stopwatch = PhasesStopwatch(["response_created", "text_start", "audio_start", "audio_end"])

# Mark phase transitions
stopwatch.time_phase_if_not_started("response_created")
# ... processing ...
stopwatch.time_phase_if_not_started("text_start")
# ... more processing ...
stopwatch.time_phase_if_not_started("audio_start")
# ... final processing ...
stopwatch.time_phase_if_not_started("audio_end")

# Get timing report
phases = stopwatch.phase_dict()
print(f"Total latency: {phases['audio_end'] - phases['response_created']}s")
Add custom stopwatches to measure specific operations.

3. WebSocket Message Inspection

Log all WebSocket messages:
import logging

emit_logger = logging.getLogger("emit")
receive_logger = logging.getLogger("receive")

emit_logger.setLevel(logging.DEBUG)
receive_logger.setLevel(logging.DEBUG)
Inspect message payloads to debug protocol issues.

4. Metrics Endpoint

Query Prometheus metrics directly:
# Get all metrics
curl http://localhost:8000/metrics

# Filter specific metrics
curl http://localhost:8000/metrics | grep worker_stt

# Get current active sessions
curl -s http://localhost:8000/metrics | grep worker_active_sessions | grep -v '#'

Debug Environment Variables

From unmute_handler.py:
# Enable TTS text debugging (bypass STT/LLM)
TTS_DEBUGGING_TEXT = "What's 'Hello world'?"

# Override audio input with file (bypass microphone)
AUDIO_INPUT_OVERRIDE = Path.home() / "audio/test.mp3"

# User silence timeout
USER_SILENCE_TIMEOUT = 7.0  # Increase for debugging

# VAD interrupt delay
UNINTERRUPTIBLE_BY_VAD_TIME_SEC = 3  # Increase to prevent accidental interrupts
Uncomment and modify these in the source code for local debugging.

Debugging Production

Docker Swarm Debugging

# List all services
docker service ls

# Check service status
docker service ps llm-wrapper_backend --no-trunc

# View service logs
docker service logs -f llm-wrapper_tts

# Inspect service configuration
docker service inspect llm-wrapper_llm

# Access running container
docker exec -it $(docker ps -q -f name=llm-wrapper_backend) /bin/bash

Scaling for Debugging

# Scale down to single replica for easier debugging
docker service scale llm-wrapper_backend=1

# Scale back up after debugging
docker service scale llm-wrapper_backend=16

Force Service Restart

# Force restart without changing config
docker service update --force llm-wrapper_tts

Additional Resources

Getting Help

If you encounter issues:
  1. Check existing GitHub issues
  2. Review logs and metrics before reporting
  3. Include reproducible steps and error messages
  4. Share relevant configuration (docker-compose.yml, etc.)
Note: From the README: “If something isn’t working for you, don’t hesitate to open an issue. We’ll do our best to help you figure out what’s wrong.”

Build docs developers (and LLMs) love