Quick Start

This guide walks you through generating speech with all three Chatterbox models: Turbo, Original, and Multilingual.

Choose Your Model

Turbo (Recommended)
Original
Multilingual

Chatterbox-Turbo is the fastest and most efficient model, perfect for voice agents and production use. It features:

350M parameters for lower VRAM usage
Native paralinguistic tags ([laugh], [chuckle], [cough])
Single-step mel decoder for ultra-fast generation
English only

Generate Your First Audio

Import the library

Import the necessary modules based on which model you want to use:

import torchaudio as ta
import torch
from chatterbox.tts_turbo import ChatterboxTurboTTS

Load the model

Initialize the model with automatic device detection:

# Load the Turbo model
model = ChatterboxTurboTTS.from_pretrained(device="cuda")

Generate speech

Create audio from text:

# Generate with Paralinguistic Tags
text = "Oh, that's hilarious! [chuckle] Um anyway, we do have a new model in store."

# Generate audio
wav = model.generate(text)

Turbo supports paralinguistic tags like [laugh], [chuckle], [cough] to add natural expressions to speech.

Save the audio

Save the generated audio to a file:

ta.save("output-turbo.wav", wav, model.sr)

Voice Cloning

All models support zero-shot voice cloning using a reference audio file. Provide an audio prompt to clone any voice:

# Generate with voice cloning
text = "Hi there, Sarah here from MochaFone calling you back [chuckle]"
wav = model.generate(text, audio_prompt_path="your_10s_ref_clip.wav")
ta.save("cloned-voice.wav", wav, model.sr)

For best results, use a reference audio clip that is 5-10 seconds long with clear speech and minimal background noise.

Complete Examples

Here are complete working examples for each model:

import torchaudio as ta
import torch
from chatterbox.tts_turbo import ChatterboxTurboTTS

# Load the Turbo model
model = ChatterboxTurboTTS.from_pretrained(device="cuda")

# Generate with Paralinguistic Tags
text = "Oh, that's hilarious! [chuckle] Um anyway, we do have a new model in store. It's the SkyNet T-800 series and it's got basically everything. Including AI integration with ChatGPT and all that jazz. Would you like me to get some prices for you?"

# Generate audio
wav = model.generate(text)
ta.save("test-turbo.wav", wav, model.sr)

Supported Languages (Multilingual Model)

The Multilingual model supports 23+ languages: Arabic (ar) • Danish (da) • German (de) • Greek (el) • English (en) • Spanish (es) • Finnish (fi) • French (fr) • Hebrew (he) • Hindi (hi) • Italian (it) • Japanese (ja) • Korean (ko) • Malay (ms) • Dutch (nl) • Norwegian (no) • Polish (pl) • Portuguese (pt) • Russian (ru) • Swedish (sv) • Swahili (sw) • Turkish (tr) • Chinese (zh)

Next Steps

Learn about advanced features and tuning parameters
Explore paralinguistic tags for Turbo
Understand voice cloning best practices
Check out the watermarking features for responsible AI

Get Started

Models

Guides

Choose Your Model

Generate Your First Audio

Voice Cloning

Complete Examples

Supported Languages (Multilingual Model)

Next Steps

Build docs developers (and LLMs) love

Get Started

Models

Guides

​Choose Your Model

​Generate Your First Audio

​Voice Cloning

​Complete Examples

​Supported Languages (Multilingual Model)

​Next Steps

Build docs developers (and LLMs) love

Choose Your Model

Generate Your First Audio

Voice Cloning

Complete Examples

Supported Languages (Multilingual Model)

Next Steps