Skip to main content

Overview

The Unmute API provides endpoints for voice cloning and voice donation. Voice cloning allows you to create custom voices from audio samples, while voice donation enables users to contribute their voices to improve the service.

Get Available Voices

Retrieve a list of pre-configured voices available for use.
curl https://api.unmute.example/v1/voices

Response

Returns an array of voice objects. Each voice object contains:
name
string
The unique identifier for the voice
source
object
Information about the voice source, including path and description
instructions
object
Optional instructions for using this voice

Response Example

[
  {
    "name": "narrator-1",
    "source": {
      "source_type": "file",
      "path_on_server": "voices/narrator-1.wav",
      "description": "Professional narrator voice"
    }
  },
  {
    "name": "conversational-2",
    "source": {
      "source_type": "file",
      "path_on_server": "voices/conversational-2.wav",
      "description": "Casual conversational voice"
    }
  }
]

Clone a Voice

Upload an audio file to create a custom voice clone.
curl -X POST https://api.unmute.example/v1/voices \
  -H "Content-Type: multipart/form-data" \
  -F "file=@/path/to/voice-sample.wav"

Request

file
file
required
Audio file containing the voice sample to clone. Supported formats include WAV, MP3, and other common audio formats.

File Size Limits

  • Maximum file size is configurable (default: defined in MAX_VOICE_FILE_SIZE_MB)
  • Files exceeding the limit will return a 413 Request Entity Too Large error
  • Minimum recommended duration: 10-30 seconds of clear speech

Response

name
string
required
The unique identifier for the cloned voice, in the format custom:<uuid>. Use this name when making TTS requests.

Response Example

{
  "name": "custom:a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}

Status Codes

200
status code
Voice successfully cloned
411
status code
Content-Length header is required
413
status code
Request entity too large - file exceeds maximum size limit

Voice Donation

Contribute your voice to help improve the Unmute service.

Step 1: Request Verification

Initiate a voice donation by requesting a verification text to read.
curl https://api.unmute.example/v1/voice-donation

Response

id
string
required
Unique identifier for this verification request. Must be included when submitting the voice donation.
text
string
required
The verification text you must read aloud. Always begins with: “I consent to my voice being used for voice cloning.” followed by two randomly selected sentences.
created_at_timetamp
number
required
Unix timestamp (seconds since epoch) when the verification was created. Verifications expire after 5 minutes.

Response Example

{
  "id": "b2c3d4e5-f6a7-8901-bcde-f12345678901",
  "text": "I consent to my voice being used for voice cloning. The quick brown fox jumps over the lazy dog. Technology continues to advance at a rapid pace.",
  "created_at_timetamp": 1709251200.0
}

Step 2: Submit Voice Donation

Submit your voice recording along with the required metadata.
curl -X POST https://api.unmute.example/v1/voice-donation \
  -F "file=@/path/to/recording.wav" \
  -F 'metadata={"email":"[email protected]","nickname":"VoiceContributor","verification_id":"b2c3d4e5-f6a7-8901-bcde-f12345678901","license":"CC0","transcription_from_client":"I consent to..."}'

Request

file
file
required
Audio file containing your recording of the verification text. Must be at least 0.1 MB and no more than the configured maximum size.
metadata
string (JSON)
required
JSON string containing the submission metadata with the following fields:

Metadata Fields

metadata.email
string
required
Your email address. Kept private and only used if you want to withdraw your donation. Not published.
metadata.nickname
string
required
Public nickname to associate with your voice donation. Maximum 30 characters.
metadata.verification_id
string (UUID)
required
The verification ID received from the GET /v1/voice-donation endpoint.
metadata.license
string
default:"CC0"
License for the voice donation. Currently only “CC0” (Creative Commons Zero) is accepted.
metadata.format_version
string
default:"1.1"
Metadata format version. Defaults to “1.1”.
metadata.transcription_from_client
string
Optional transcription of what you recorded. Note: this is sent by the client and could be manipulated.

Response

{}
An empty object is returned on successful submission.

Status Codes

200
status code
Voice donation successfully submitted
400
status code
Invalid request. Possible reasons:
  • Invalid JSON in metadata field
  • Invalid submission data (validation error)
  • Audio file too small (< 0.1 MB)
  • Audio file too large (> maximum size)
  • Nickname too long (> 30 characters)
  • Verification ID not found or expired (> 5 minutes old)
411
status code
Content-Length header is required
413
status code
Request entity too large - file exceeds maximum size limit

Error Response Example

{
  "detail": "Verification expired after 5 minutes. Please request a new verification."
}

CORS

All endpoints support CORS for the following origins:
  • http://localhost
  • http://localhost:3000
Credentials are allowed for cross-origin requests.

Notes

  • Voice cloning embeddings are cached for 1 hour
  • Verification requests are cached for 1 hour but expire after 5 minutes for submission
  • All audio processing is handled asynchronously to prevent blocking
  • Voice donations are saved with both audio and metadata files for future processing

Build docs developers (and LLMs) love