Backend Architecture
FastAPI Application
File:unmute/main_websocket.py
The main backend server is a FastAPI application that handles:
main_websocket.py:71):
UnmuteHandler
File:unmute/unmute_handler.py
The core orchestration class that manages a single conversation session.
Key Responsibilities:
- Receive audio from frontend
- Route audio to STT
- Manage conversation state
- Generate LLM responses
- Stream TTS audio back to frontend
- Handle interruptions
WebSocket Protocol Handler
Files:unmute/main_websocket.py:380-404- Main route handlerunmute/main_websocket.py:406-492- Receive loopunmute/main_websocket.py:512-582- Emit loop
-
Receive Loop - Handles incoming messages:
-
Emit Loop - Sends messages to frontend:
Quest Manager
File:unmute/quest_manager.py
Manages lifecycle of background services with clean cancellation.
- Async context manager for automatic cleanup
- Service initialization with retries
- Graceful shutdown on errors
- Quest removal (for interruptions)
Chatbot State Manager
File:unmute/llm/chatbot.py
Manages conversation history and state transitions.
Service Discovery
File:unmute/service_discovery.py
Finds available service instances with capacity.
- Services can reject with
Errormessage when at capacity - Backend tries next instance
- Raises
MissingServiceAtCapacityif all exhausted
Metrics Collection
File:unmute/metrics.py
Prometheus metrics using prometheus_client.
Counter Metrics:
Audio Processing
Opus Codec:Voice Management
File:unmute/tts/voices.py
Loads and validates voices from voices.yaml.
- File: Pre-recorded audio on server
- Freesound: Creative Commons audio from Freesound.org
- Custom: User-uploaded voice cloning
main_websocket.py:200):
Voice Cloning
File:unmute/tts/voice_cloning.py
Generates voice embeddings from uploaded audio.
main_websocket.py:240):
Recording System
File:unmute/recorder.py
Optional conversation recording for debugging/analysis.
- Audio data anonymized (only sample counts, not PCM)
- User can opt-out via
allow_recording: false - Recordings stored in
RECORDINGS_DIR(configurable)
Timer Utilities
File:unmute/timer.py
Stopwatch for accurate timing measurements.
Exception Handling
File:unmute/exceptions.py
Custom exceptions for service failures.
main_websocket.py:334):
CORS Configuration
File:unmute/main_websocket.py:84
CORS middleware for local development.
Middleware
Upload Size Limiting
File:unmute/main_websocket.py:213
Limits voice upload file size.
Prometheus Instrumentation
File:unmute/main_websocket.py:74
Automatic HTTP request metrics.
Configuration
File:unmute/kyutai_constants.py
Centralized configuration from environment variables.
Next Steps
- Speech-to-Text - STT component details
- Text-to-Speech - TTS component details
- LLM Integration - LLM integration
- Frontend - Frontend implementation
- Backend - Backend implementation deep dive