Import the necessary modules based on which model you want to use:
import torchaudio as taimport torchfrom chatterbox.tts_turbo import ChatterboxTurboTTS
2
Load the model
Initialize the model with automatic device detection:
# Load the Turbo modelmodel = ChatterboxTurboTTS.from_pretrained(device="cuda")
3
Generate speech
Create audio from text:
# Generate with Paralinguistic Tagstext = "Oh, that's hilarious! [chuckle] Um anyway, we do have a new model in store."# Generate audiowav = model.generate(text)
Turbo supports paralinguistic tags like [laugh], [chuckle], [cough] to add natural expressions to speech.
All models support zero-shot voice cloning using a reference audio file. Provide an audio prompt to clone any voice:
# Generate with voice cloningtext = "Hi there, Sarah here from MochaFone calling you back [chuckle]"wav = model.generate(text, audio_prompt_path="your_10s_ref_clip.wav")ta.save("cloned-voice.wav", wav, model.sr)
For best results, use a reference audio clip that is 5-10 seconds long with clear speech and minimal background noise.
Here are complete working examples for each model:
import torchaudio as taimport torchfrom chatterbox.tts_turbo import ChatterboxTurboTTS# Load the Turbo modelmodel = ChatterboxTurboTTS.from_pretrained(device="cuda")# Generate with Paralinguistic Tagstext = "Oh, that's hilarious! [chuckle] Um anyway, we do have a new model in store. It's the SkyNet T-800 series and it's got basically everything. Including AI integration with ChatGPT and all that jazz. Would you like me to get some prices for you?"# Generate audiowav = model.generate(text)ta.save("test-turbo.wav", wav, model.sr)