Configuration Options

Chatterbox provides several configuration parameters to customize your speech generation. These settings control expressiveness, voice characteristics, sampling behavior, and performance.

Device Options

Specify the computing device when loading the model:

from chatterbox.tts_turbo import ChatterboxTurboTTS
from chatterbox.tts import ChatterboxTTS
from chatterbox.mtl_tts import ChatterboxMultilingualTTS

# CUDA (NVIDIA GPU)
model = ChatterboxTurboTTS.from_pretrained(device="cuda")

# CPU
model = ChatterboxTTS.from_pretrained(device="cpu")

# MPS (Apple Silicon)
model = ChatterboxMultilingualTTS.from_pretrained(device="mps")

Device Selection Guide

Device	Best For	Performance
`cuda`	NVIDIA GPUs	Fastest - recommended for production
`mps`	Apple Silicon (M1/M2/M3)	Fast - good for Mac users
`cpu`	Any system	Slower - use when GPU unavailable

Auto-detection: The models automatically fall back to CPU if the requested device is unavailable. For Apple Silicon Macs without MPS support, the model will use CPU automatically.

Auto-Device Detection Example

import torch

# Automatically detect the best available device
if torch.cuda.is_available():
    device = "cuda"
elif torch.backends.mps.is_available():
    device = "mps"
else:
    device = "cpu"

print(f"Using device: {device}")
model = ChatterboxTTS.from_pretrained(device=device)

From: example_tts.py (lines 6-14)

Generation Parameters

All generation parameters are passed to the generate() method:

wav = model.generate(
    text="Your text here",
    audio_prompt_path="reference.wav",
    cfg_weight=0.5,
    exaggeration=0.5,
    temperature=0.8,
    repetition_penalty=1.2,
    min_p=0.05,
    top_p=0.95,
    top_k=1000,
    norm_loudness=True
)

cfg_weight Parameter

Range: 0.0 to 1.0 (typically) Default: 0.5 (standard models), 0.0 (Turbo - ignored) Controls how strongly the model follows the reference voice characteristics. Higher values make the output more similar to the reference audio.

# Light conditioning - more variation from reference
wav = model.generate(text, cfg_weight=0.3)

# Default - balanced
wav = model.generate(text, cfg_weight=0.5)

# Strong conditioning - closer to reference
wav = model.generate(text, cfg_weight=0.7)

When to Adjust cfg_weight

Fast-speaking reference audio

If your reference speaker talks very quickly, lower cfg_weight to improve pacing:

wav = model.generate(
    text,
    audio_prompt_path="fast_speaker.wav",
    cfg_weight=0.3  # Slows down pacing
)

From README: “If the reference speaker has a fast speaking style, lowering cfg_weight to around 0.3 can improve pacing.”

Cross-language synthesis

When using a voice from one language to speak another, set cfg_weight=0 to reduce accent transfer:

# English voice speaking French with minimal accent
wav = multilingual_model.generate(
    "Bonjour!",
    language_id="fr",
    audio_prompt_path="english_speaker.wav",
    cfg_weight=0.0  # Reduces English accent
)

From README: “To mitigate [accent transfer], set cfg_weight to 0.”

Expressive or dramatic speech

For more expressive output, combine lower cfg_weight with higher exaggeration:

wav = model.generate(
    text,
    cfg_weight=0.3,     # Lower for slower pacing
    exaggeration=0.7    # Higher for more expression
)

From README: “Try lower cfg_weight values (e.g. ~0.3) and increase exaggeration to around 0.7 or higher.”

Turbo Model: Chatterbox Turbo ignores cfg_weight during generation. The parameter only applies to standard Chatterbox and multilingual models.

exaggeration Parameter

Range: 0.0 to 1.0+ Default: 0.5 (standard models), 0.0 (Turbo) Controls the expressiveness and emotional intensity of the generated speech.

# Neutral, flat delivery
wav = model.generate(text, exaggeration=0.0)

# Moderate expressiveness (default)
wav = model.generate(text, exaggeration=0.5)

# High expressiveness
wav = model.generate(text, exaggeration=0.7)

# Very dramatic
wav = model.generate(text, exaggeration=1.0)

Effects of Exaggeration

Lower values (0.0-0.3): More neutral, professional tone
Medium values (0.4-0.6): Natural conversation, moderate emotion
Higher values (0.7-1.0): Dramatic, expressive, emotional delivery

Speed Impact: Higher exaggeration tends to speed up speech. Compensate by reducing cfg_weight for more deliberate pacing.

Exaggeration Tips from README

General Use:

“The default settings (exaggeration=0.5, cfg_weight=0.5) work well for most prompts across all languages.”

Expressive Speech:

“Try lower cfg_weight values (e.g. ~0.3) and increase exaggeration to around 0.7 or higher. Higher exaggeration tends to speed up speech; reducing cfg_weight helps compensate with slower, more deliberate pacing.”

Example Configurations

# Professional narration
wav = model.generate(
    text,
    cfg_weight=0.5,
    exaggeration=0.3  # Controlled, professional
)

# Natural conversation
wav = model.generate(
    text,
    cfg_weight=0.5,
    exaggeration=0.5  # Balanced
)

# Dramatic storytelling
wav = model.generate(
    text,
    cfg_weight=0.3,   # Slower pacing
    exaggeration=0.8  # Very expressive
)

Turbo Model: Chatterbox Turbo ignores exaggeration during generate(). It only uses exaggeration when you explicitly call prepare_conditionals().

Sampling Parameters

These parameters control the randomness and diversity of speech generation.

temperature

Range: 0.0 to 2.0+ Default: 0.8 Controls randomness in token selection. Higher values produce more variation.

# More consistent, predictable output
wav = model.generate(text, temperature=0.6)

# Default balance
wav = model.generate(text, temperature=0.8)

# More variation and creativity
wav = model.generate(text, temperature=1.0)

repetition_penalty

Range: 1.0 to 2.5+ Default: 1.2 (Turbo), 2.0 (Multilingual) Penalizes repeated tokens to reduce repetitive speech patterns.

# No penalty - may repeat more
wav = model.generate(text, repetition_penalty=1.0)

# Turbo default - light penalty
wav = model.generate(text, repetition_penalty=1.2)

# Multilingual default - stronger penalty  
wav = model.generate(text, repetition_penalty=2.0)

top_p (Nucleus Sampling)

Range: 0.0 to 1.0 Default: 0.95 (Turbo), 1.0 (others) Keeps only the most probable tokens whose cumulative probability exceeds top_p.

# More focused on likely tokens
wav = model.generate(text, top_p=0.9)

# Turbo default
wav = model.generate(text, top_p=0.95)

# No filtering
wav = model.generate(text, top_p=1.0)

top_k

Range: 1 to 10000+ Default: 1000 (Turbo only) Keeps only the top K most probable tokens. Only used by Turbo model.

# Turbo model only
wav = model.generate(text, top_k=500)   # More conservative
wav = model.generate(text, top_k=1000)  # Default
wav = model.generate(text, top_k=2000)  # More variety

min_p

Range: 0.0 to 1.0 Default: 0.0 (Turbo - ignored), 0.05 (others) Sets a minimum probability threshold for token selection.

# Standard/Multilingual models
wav = model.generate(text, min_p=0.05)  # Default
wav = model.generate(text, min_p=0.10)  # More conservative

Turbo Model: Chatterbox Turbo ignores min_p. It only applies to standard and multilingual models.

Audio Processing Options

norm_loudness

Type: Boolean Default: True (Turbo only) Normalizes the loudness of the reference audio before processing.

# Normalize reference audio (default for Turbo)
wav = model.generate(
    text,
    audio_prompt_path="reference.wav",
    norm_loudness=True
)

# Skip normalization
wav = model.generate(
    text,
    audio_prompt_path="reference.wav",
    norm_loudness=False
)

Loudness normalization uses LUFS (Loudness Units relative to Full Scale) with a target of -27 LUFS, ensuring consistent volume levels across different reference audio files. From tts_turbo.py (lines 204-215)

Model-Specific Parameter Support

Parameter	Turbo	Standard	Multilingual
`cfg_weight`	❌ Ignored	✅ Supported	✅ Supported
`exaggeration`	⚠️ Only in `prepare_conditionals()`	✅ Supported	✅ Supported
`min_p`	❌ Ignored	✅ Supported	✅ Supported
`top_k`	✅ Supported	❌ Not used	❌ Not used
`norm_loudness`	✅ Supported	❌ Not used	❌ Not used
`temperature`	✅ Supported	✅ Supported	✅ Supported
`repetition_penalty`	✅ Supported	✅ Supported	✅ Supported
`top_p`	✅ Supported	✅ Supported	✅ Supported

When you pass ignored parameters to a model, you’ll see a warning but generation will continue:

WARNING: CFG, min_p and exaggeration are not supported by Turbo version and will be ignored.

Complete Configuration Examples

Turbo Model - Voice Agent

from chatterbox.tts_turbo import ChatterboxTurboTTS
import torchaudio as ta

model = ChatterboxTurboTTS.from_pretrained(device="cuda")

text = "Hi there! [chuckle] How can I help you today?"
wav = model.generate(
    text,
    audio_prompt_path="agent_voice.wav",
    temperature=0.8,
    repetition_penalty=1.2,
    top_p=0.95,
    top_k=1000,
    norm_loudness=True
)
ta.save("agent_output.wav", wav, model.sr)

Standard Model - Professional Narration

from chatterbox.tts import ChatterboxTTS
import torchaudio as ta

model = ChatterboxTTS.from_pretrained(device="cuda")

text = "Welcome to this comprehensive guide on audio synthesis technology."
wav = model.generate(
    text,
    audio_prompt_path="narrator.wav",
    cfg_weight=0.5,
    exaggeration=0.3,  # Professional, controlled
    temperature=0.7,   # Consistent output
    repetition_penalty=1.2,
    min_p=0.05,
    top_p=1.0
)
ta.save("narration.wav", wav, model.sr)

Multilingual Model - Dramatic Speech

from chatterbox.mtl_tts import ChatterboxMultilingualTTS
import torchaudio as ta

model = ChatterboxMultilingualTTS.from_pretrained(device="cuda")

text = "¡Esto es absolutamente increíble!"
wav = model.generate(
    text,
    language_id="es",
    audio_prompt_path="expressive_speaker.wav",
    cfg_weight=0.3,    # Slower pacing
    exaggeration=0.8,  # Very expressive
    temperature=0.9,   # More variation
    repetition_penalty=2.0,
    min_p=0.05,
    top_p=1.0
)
ta.save("dramatic_spanish.wav", wav, model.sr)

Default Values Summary

Chatterbox Turbo

model.generate(
    text,
    audio_prompt_path=None,
    temperature=0.8,
    repetition_penalty=1.2,
    top_p=0.95,
    top_k=1000,
    min_p=0.0,          # Ignored
    cfg_weight=0.0,     # Ignored  
    exaggeration=0.0,   # Ignored
    norm_loudness=True
)

Standard Chatterbox

model.generate(
    text,
    audio_prompt_path=None,
    temperature=0.8,
    repetition_penalty=1.2,
    top_p=1.0,
    min_p=0.05,
    cfg_weight=0.5,
    exaggeration=0.5
)

Chatterbox Multilingual

model.generate(
    text,
    language_id="en",  # Required
    audio_prompt_path=None,
    temperature=0.8,
    repetition_penalty=2.0,  # Higher than others
    top_p=1.0,
    min_p=0.05,
    cfg_weight=0.5,
    exaggeration=0.5
)

Performance Optimization

For Maximum Speed

Use CUDA device with NVIDIA GPU
Use Chatterbox Turbo (350M params vs 500M)
Keep reference audio at 10 seconds or less
Reuse conditionals for the same voice

# Pre-compute conditionals once
model.prepare_conditionals("voice.wav")

# Generate multiple times without reprocessing
for text in text_list:
    wav = model.generate(text)  # No audio_prompt_path needed

For Best Quality

Use Standard Chatterbox or Multilingual for more parameters
Tune cfg_weight and exaggeration for your use case
Use high-quality reference audio (22050Hz+)
Adjust temperature for consistency vs. variation

Troubleshooting Configuration Issues

Output too fast

# Lower cfg_weight
wav = model.generate(text, cfg_weight=0.3)

# Reduce exaggeration
wav = model.generate(text, exaggeration=0.3)

Output too monotone

# Increase exaggeration
wav = model.generate(text, exaggeration=0.7)

# Increase temperature for variation
wav = model.generate(text, temperature=1.0)

Repetitive speech

# Increase repetition penalty
wav = model.generate(text, repetition_penalty=2.5)

Voice doesn’t match reference

# Increase cfg_weight
wav = model.generate(text, cfg_weight=0.7)

# Check reference audio quality and duration

Get Started

Models

Guides

Device Options

Device Selection Guide

Auto-Device Detection Example

Generation Parameters

cfg_weight Parameter

When to Adjust cfg_weight

exaggeration Parameter

Effects of Exaggeration

Exaggeration Tips from README

Example Configurations

Sampling Parameters

temperature

repetition_penalty

top_p (Nucleus Sampling)

top_k

min_p

Audio Processing Options

norm_loudness

Model-Specific Parameter Support

Complete Configuration Examples

Turbo Model - Voice Agent

Standard Model - Professional Narration

Multilingual Model - Dramatic Speech

Default Values Summary

Chatterbox Turbo

Standard Chatterbox

Chatterbox Multilingual

Performance Optimization

For Maximum Speed

For Best Quality

Troubleshooting Configuration Issues

Output too fast

Output too monotone

Repetitive speech

Voice doesn’t match reference

Build docs developers (and LLMs) love

Get Started

Models

Guides

​Device Options

​Device Selection Guide

​Auto-Device Detection Example

​Generation Parameters

​cfg_weight Parameter

​When to Adjust cfg_weight

​exaggeration Parameter

​Effects of Exaggeration

​Exaggeration Tips from README

​Example Configurations

​Sampling Parameters

​temperature

​repetition_penalty

​top_p (Nucleus Sampling)

​top_k

​min_p

​Audio Processing Options

​norm_loudness

​Model-Specific Parameter Support

​Complete Configuration Examples

​Turbo Model - Voice Agent

​Standard Model - Professional Narration

​Multilingual Model - Dramatic Speech

​Default Values Summary

​Chatterbox Turbo

​Standard Chatterbox

​Chatterbox Multilingual

​Performance Optimization

​For Maximum Speed

​For Best Quality

​Troubleshooting Configuration Issues

​Output too fast

​Output too monotone

​Repetitive speech

​Voice doesn’t match reference

Build docs developers (and LLMs) love

Device Options

Device Selection Guide

Auto-Device Detection Example

Generation Parameters

cfg_weight Parameter

When to Adjust cfg_weight

exaggeration Parameter

Effects of Exaggeration

Exaggeration Tips from README

Example Configurations

Sampling Parameters

temperature

repetition_penalty

top_p (Nucleus Sampling)

top_k

min_p

Audio Processing Options

norm_loudness

Model-Specific Parameter Support

Complete Configuration Examples

Turbo Model - Voice Agent

Standard Model - Professional Narration

Multilingual Model - Dramatic Speech

Default Values Summary

Chatterbox Turbo

Standard Chatterbox

Chatterbox Multilingual

Performance Optimization

For Maximum Speed

For Best Quality

Troubleshooting Configuration Issues

Output too fast

Output too monotone

Repetitive speech

Voice doesn’t match reference