Overview
Unmute integrates with any OpenAI-compatible LLM server to generate conversational responses. The system uses streaming completions to minimize latency and enable real-time text-to-speech synthesis. Key Features:- OpenAI-compatible API (works with VLLM, OpenAI, Ollama, etc.)
- Streaming completions for low latency
- Word-level chunking for TTS
- Dynamic system prompts and character personalities
- Context management and preprocessing
Architecture
LLM Server
Default: VLLM
Technology: VLLM (Very Large Language Model) Default Model: Llama 3.2 1B Instruct API: OpenAI-compatible/v1/chat/completions
Docker Compose (docker-compose.yml:101):
- VRAM: ~6.1 GB (Llama 3.2 1B)
- Concurrent requests: Batched automatically by VLLM
- Mistral Small 3.2 24B: Better quality, more VRAM
- Gemma 3 12B: Good balance
External LLM Servers
Unmute supports any OpenAI-compatible server: Ollama:Python Client
File:unmute/llm/llm_utils.py
OpenAI Client
Model Selection
File:llm_utils.py:110
VLLMStream Class
File:llm_utils.py:126
Conversation Management
Chatbot Class
File:unmute/llm/chatbot.py
Message Format
OpenAI chat completion format:Adding Messages
File:chatbot.py:39
Conversation State
File:chatbot.py:21
waiting_for_user: Empty user message, ready for inputuser_speaking: Non-empty user message, accumulating speechbot_speaking: Assistant message, TTS active
Message Preprocessing
File:llm_utils.py:16
Messages are preprocessed before sending to LLM:
INTERRUPTION_CHAR = "—"(em-dash): Marks interrupted messagesUSER_SILENCE_MARKER = "...": User silent for >7s
Word-Level Chunking
File:llm_utils.py:65
LLM output is rechunked to word boundaries for TTS:
System Prompts
File:unmute/llm/system_prompt.py
Instructions Classes
Common Instructions
Dynamic Prompts
Quiz Show (system_prompt.py:200):
system_prompt.py:300):
Response Generation
File:unmute/unmute_handler.py:184
Full Pipeline
Temperature Settings
File:unmute/unmute_handler.py:58
- First message: Variety in greetings
- Later messages: Consistent personality
Interruption Handling
File:unmute/unmute_handler.py:583
- LLM stream cancelled (no more tokens)
- TTS connection closed (no more audio)
- Chat history contains partial response with
— - Next preprocessing will remove the
—
Special Behaviors
Long Silence Detection
File:unmute/unmute_handler.py:626
"..." and can prompt user or check if they’re there.
Goodbye Detection
File:unmute/unmute_handler.py:609
Metrics
File:unmute/metrics.py
LLM-Specific Metrics
Metrics Recording
File:unmute/unmute_handler.py:220
Configuration
Environment Variables:Next Steps
- Backend - Backend orchestration
- Speech-to-Text - User input processing
- Text-to-Speech - Audio synthesis
- Frontend - User interface