Hugging Face Setup

Unmute uses models from Hugging Face Hub, including the default LLM and the speech-to-text/text-to-speech models. To access these models, you need to set up a Hugging Face access token.

Why You Need a Token

The default docker-compose.yml configuration uses Llama 3.2 1B Instruct as the LLM, which requires only 16GB of GPU memory. This model is freely available but is gated, meaning you need to:

Accept the model’s usage conditions
Authenticate with a Hugging Face token

Alternative models like Mistral Small 3.2 24B (~24GB VRAM) and Gemma 3 12B also have similar requirements.

Setup Steps

Create a Hugging Face Account

If you don’t already have one, sign up for Hugging Face.

Accept Model Conditions

Visit the model page and accept the usage conditions:

Llama 3.2 1B Instruct (default in docker-compose.yml)
Or Mistral Small 3.2 24B (alternative, better quality)
Or Gemma 3 12B (alternative)

Create an Access Token

Go to Settings → Access Tokens
Click “New token”
Select Fine-grained token type
Grant permission: “Read access to contents of all public gated repos you can access”
Copy the token (starts with hf_)

Do not use tokens with write access when deploying publicly. If the server is compromised, an attacker would gain write access to your Hugging Face models and datasets.

Add Token to Environment

Add the token to your shell configuration file (~/.bashrc, ~/.zshrc, or equivalent):

export HUGGING_FACE_HUB_TOKEN=hf_your_token_here

Reload your shell configuration:

source ~/.bashrc  # or ~/.zshrc

Verify the Token

Confirm the token is set correctly:

echo $HUGGING_FACE_HUB_TOKEN

This should print your token starting with hf_.

Using the Token

Docker Compose

The docker-compose.yml file automatically passes the token to services that need it:

services:
  llm:
    environment:
      - HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN
  
  tts:
    environment:
      - HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN
  
  stt:
    environment:
      - HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN

Make sure the environment variable is set in your shell before running docker compose up.

Dockerless Deployment

For dockerless setups, the token is automatically read from your environment when you run the startup scripts:

./dockerless/start_llm.sh   # Uses $HUGGING_FACE_HUB_TOKEN
./dockerless/start_stt.sh
./dockerless/start_tts.sh

Troubleshooting

Token not found error

If you see errors about missing authentication:

Verify the token is set: echo $HUGGING_FACE_HUB_TOKEN
Make sure you’ve reloaded your shell after adding it to .bashrc
For Docker, ensure the token was set before running docker compose up

Access denied to gated model

If you get permission errors:

Confirm you’ve accepted the model conditions on Hugging Face
Wait a few minutes for the permissions to propagate
Verify your token has “Read access to contents of all public gated repos”

Token with write permissions

If you accidentally created a token with write access:

Go to Settings → Access Tokens
Revoke the old token
Create a new fine-grained token with read-only access
Update your environment variable

Security Best Practices

Never commit your Hugging Face token to version control. Use environment variables only.

Use fine-grained tokens with minimal permissions
Create separate tokens for development and production
Rotate tokens periodically
Revoke tokens immediately if compromised
Never use write-access tokens for public deployments

Get Started

Deployment

Configuration

Hugging Face Setup

Why You Need a Token

Setup Steps

Using the Token

Docker Compose

Dockerless Deployment

Troubleshooting

Security Best Practices

Build docs developers (and LLMs) love

Get Started

Deployment

Configuration

​Why You Need a Token

​Setup Steps

​Using the Token

​Docker Compose

​Dockerless Deployment

​Troubleshooting

​Security Best Practices

Build docs developers (and LLMs) love

Why You Need a Token

Setup Steps

Using the Token

Docker Compose

Dockerless Deployment

Troubleshooting

Security Best Practices