Skip to main content
Unmute uses models from Hugging Face Hub, including the default LLM and the speech-to-text/text-to-speech models. To access these models, you need to set up a Hugging Face access token.

Why You Need a Token

The default docker-compose.yml configuration uses Llama 3.2 1B Instruct as the LLM, which requires only 16GB of GPU memory. This model is freely available but is gated, meaning you need to:
  1. Accept the model’s usage conditions
  2. Authenticate with a Hugging Face token
Alternative models like Mistral Small 3.2 24B (~24GB VRAM) and Gemma 3 12B also have similar requirements.

Setup Steps

1

Create a Hugging Face Account

If you don’t already have one, sign up for Hugging Face.
2

Accept Model Conditions

Visit the model page and accept the usage conditions:
3

Create an Access Token

  1. Go to Settings → Access Tokens
  2. Click “New token”
  3. Select Fine-grained token type
  4. Grant permission: “Read access to contents of all public gated repos you can access”
  5. Copy the token (starts with hf_)
Do not use tokens with write access when deploying publicly. If the server is compromised, an attacker would gain write access to your Hugging Face models and datasets.
4

Add Token to Environment

Add the token to your shell configuration file (~/.bashrc, ~/.zshrc, or equivalent):
export HUGGING_FACE_HUB_TOKEN=hf_your_token_here
Reload your shell configuration:
source ~/.bashrc  # or ~/.zshrc
5

Verify the Token

Confirm the token is set correctly:
echo $HUGGING_FACE_HUB_TOKEN
This should print your token starting with hf_.

Using the Token

Docker Compose

The docker-compose.yml file automatically passes the token to services that need it:
services:
  llm:
    environment:
      - HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN
  
  tts:
    environment:
      - HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN
  
  stt:
    environment:
      - HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN
Make sure the environment variable is set in your shell before running docker compose up.

Dockerless Deployment

For dockerless setups, the token is automatically read from your environment when you run the startup scripts:
./dockerless/start_llm.sh   # Uses $HUGGING_FACE_HUB_TOKEN
./dockerless/start_stt.sh
./dockerless/start_tts.sh

Troubleshooting

If you see errors about missing authentication:
  1. Verify the token is set: echo $HUGGING_FACE_HUB_TOKEN
  2. Make sure you’ve reloaded your shell after adding it to .bashrc
  3. For Docker, ensure the token was set before running docker compose up
If you get permission errors:
  1. Confirm you’ve accepted the model conditions on Hugging Face
  2. Wait a few minutes for the permissions to propagate
  3. Verify your token has “Read access to contents of all public gated repos”
If you accidentally created a token with write access:
  1. Go to Settings → Access Tokens
  2. Revoke the old token
  3. Create a new fine-grained token with read-only access
  4. Update your environment variable

Security Best Practices

Never commit your Hugging Face token to version control. Use environment variables only.
  • Use fine-grained tokens with minimal permissions
  • Create separate tokens for development and production
  • Rotate tokens periodically
  • Revoke tokens immediately if compromised
  • Never use write-access tokens for public deployments

Build docs developers (and LLMs) love