This is more difficult to set up than Docker Compose due to various dependencies. Consider using Docker Compose unless you specifically need a Dockerless setup.
Requirements
Hardware
- GPU: CUDA-compatible GPU with at least 16 GB VRAM
- Architecture: x86_64
- OS: Linux or Windows with WSL
VRAM Usage by Service
- LLM: 6.1 GB
- TTS: 5.3 GB
- STT: 2.5 GB
- Total: ~14 GB minimum
Software Dependencies
Install CUDA 12.1
Install CUDA 12.1 via conda or from the NVIDIA website.This is required for the Rust processes (TTS and STT).
Starting Services
Each service must be started in a separate terminal session or tmux window. The repository includes helper scripts in thedockerless/ directory.
Service Startup Order
While services can be started in any order, here’s the recommended sequence:Start Frontend
- Installs Node.js dependencies with
pnpm install - Ensures the correct Node.js LTS version
- Starts the Next.js development server on port 3000
start_frontend.sh
Start LLM Server
- Model:
google/gemma-3-1b-it - Port: 8091
- Max model length: 8192 tokens
- GPU memory utilization: 30%
- VRAM usage: ~6.1 GB
start_llm.sh
Start STT Server
- Port: 8090
- VRAM usage: ~2.5 GB
- Creates Python virtual environment for libpython dependency
- Sets
LD_LIBRARY_PATHfor Python library linking - Installs moshi-server with CUDA features
- Runs with STT-specific config
The first run will take several minutes to compile the Rust binary.
Start TTS Server
- Port: 8089
- VRAM usage: ~5.3 GB
cargo install to ensure the Rust binary can find Python libraries.Accessing Unmute
Once all services are running, access the web interface at:Environment Variables
The backend expects these default service URLs. If you’ve changed the ports, set these environment variables before starting the backend:Using tmux for Session Management
To manage multiple services easily, use tmux:Customizing Services
Change LLM Model
Editdockerless/start_llm.sh:
Adjust GPU Memory Usage
Modify--gpu-memory-utilization in the LLM script:
- Lower values: Less memory, supports longer conversations
- Higher values: Faster inference, more memory required
Use Multiple GPUs
SetCUDA_VISIBLE_DEVICES before starting each service:
Troubleshooting
Compilation Errors
Issue: Sentencepiece build fails Solution: Set the CXXFLAGS environment variable:Missing Python Dependencies
Issue:no module named 'huggingface_hub' when running TTS/STT
Solution:
- Activate the virtual environment:
source .venv/bin/activate - Set LD_LIBRARY_PATH before running cargo install
- Force rebuild:
cargo install --force --features cuda [email protected]
Wrong moshi-server Binary
Issue:moshi-server: error: unrecognized arguments: worker
Solution: You’re using the Python package binary instead of the Rust package. Update the Python package:
Port Already in Use
Change the port in the respective start script and update the environment variables in the backend start script.Stopping Services
To stop a service:- Switch to its terminal/tmux pane
- Press
Ctrl+C
Development Workflow
The Dockerless setup is ideal for development because:- Frontend hot-reloading: Changes to frontend code reload automatically
- Backend auto-reload: The
--reloadflag restarts the backend on code changes - Easy debugging: Direct access to logs in each terminal
- Fast iteration: No Docker image rebuilding needed
Next Steps
- Set up remote access to connect from another machine
- Learn about Docker Compose for easier deployment
- Explore production deployment with Docker Swarm