Generation Parameters

Overview

Chatterbox TTS provides four key parameters for controlling speech generation quality, consistency, and expressiveness. Understanding these parameters helps you achieve optimal results for different use cases.

Parameters Reference

Temperature

Controls the randomness and creativity of the generation.

temperature

number

default:"0.8"

Range: 0.0 to 2.0Effect:

0.0 - 0.5: More consistent and predictable output. Use for formal content like news, announcements, or technical documentation.
0.6 - 1.0: Balanced naturalness with some variation. Ideal for most conversational content.
1.0 - 2.0: More expressive and varied output. Good for storytelling, character voices, or creative content.

const result = await trpc.generations.create.mutate({
  text: "The stock market opened at 9:30 AM Eastern Time.",
  voiceId: "narrator-voice",
  temperature: 0.3, // Consistent, professional tone
  topP: 0.95,
  topK: 1000,
  repetitionPenalty: 1.2,
});

Top P (Nucleus Sampling)

Controls diversity by considering only the top probability mass.

topP

number

default:"0.95"

Range: 0.0 to 1.0Effect:

0.5 - 0.7: Very focused sampling. Consistent but may sound repetitive.
0.8 - 0.95: Balanced diversity. Good for most use cases.
0.95 - 1.0: Maximum diversity. More natural but less predictable.

Technical: Samples from the smallest set of tokens whose cumulative probability exceeds topP.

How It Works

Nucleus sampling dynamically adjusts the token pool based on probability distribution:

Token probabilities:
  "hello": 0.40
  "hi":    0.30
  "hey":   0.20
  "yo":    0.05
  "sup":   0.05

With topP=0.90:
  ✓ "hello" (0.40, cumulative: 0.40)
  ✓ "hi"    (0.30, cumulative: 0.70)
  ✓ "hey"   (0.20, cumulative: 0.90)
  ✗ "yo"    (would exceed 0.90)
  ✗ "sup"   (excluded)

Top K

Limits the sampling pool to the K most likely tokens.

topK

number

default:"1000"

Range: 1 to 10,000Effect:

1 - 50: Very restricted vocabulary. Extremely consistent but unnatural.
100 - 500: Moderate restriction. Good for technical or formal content.
500 - 2000: Standard range. Balances naturalness and control.
2000+: Minimal restriction. Maximum vocabulary diversity.

Technical: Only the K most probable tokens are considered for sampling, regardless of their probability values.

Top-K vs Top-P

When to use Top-K vs Top-P

Use Top-K when:

You want a fixed vocabulary size regardless of confidence
You need very predictable output (low K values)
You’re generating technical or domain-specific content

Use Top-P when:

You want adaptive sampling based on model confidence
You prefer natural-sounding variation
You’re generating conversational or narrative content

Use both (recommended):

Top-P provides adaptive diversity
Top-K sets a hard upper limit on vocabulary
This combination works well for most use cases

Repetition Penalty

Penalizes tokens that have already been generated.

repetitionPenalty

number

default:"1.2"

Range: 1.0 to 2.0Effect:

1.0: No penalty. May lead to repetitive phrases or words.
1.1 - 1.3: Subtle penalty. Reduces repetition while maintaining naturalness.
1.4 - 2.0: Strong penalty. Actively avoids repetition, may sound forced.

Technical: Divides the logits of previously generated tokens by this value, making them less likely to be selected again.

Example Impact

"The cat sat on the mat. The cat was happy. The cat purred."
# Notice: "The cat" repeats frequently

Use Case Presets

News & Announcements

const newsPreset = {
  temperature: 0.3,
  topP: 0.85,
  topK: 500,
  repetitionPenalty: 1.15,
};

Optimized for: Consistency, clarity, professional tone

Conversational Content

const conversationalPreset = {
  temperature: 0.8,
  topP: 0.95,
  topK: 1000,
  repetitionPenalty: 1.2,
};

Optimized for: Natural flow, variety, engagement

Storytelling & Characters

const storytellingPreset = {
  temperature: 1.2,
  topP: 0.95,
  topK: 1500,
  repetitionPenalty: 1.3,
};

Optimized for: Expressiveness, emotion, dramatic delivery

Technical Documentation

const technicalPreset = {
  temperature: 0.4,
  topP: 0.90,
  topK: 600,
  repetitionPenalty: 1.1,
};

Optimized for: Precision, consistency, minimal variation

Parameter Interactions

Temperature + Top-P

Low Temp + Low Top-P
High Temp + High Top-P
Low Temp + High Top-P
High Temp + Low Top-P

Config: temperature: 0.3, topP: 0.7Result: Very consistent, conservative output. Good for formal content but may sound robotic.

Config: temperature: 1.5, topP: 0.98Result: Maximum expressiveness and variety. Great for creative content but less predictable.

Config: temperature: 0.4, topP: 0.95Result: Consistent with occasional variation. Balanced for professional conversational content.

Config: temperature: 1.3, topP: 0.75Result: Expressive within a constrained vocabulary. Useful for character voices with specific speech patterns.

Tuning Workflow

Start with Defaults

Begin with the default values:

temperature: 0.8
topP: 0.95
topK: 1000
repetitionPenalty: 1.2

Adjust Temperature

If output is too monotone → increase temperatureIf output is too random → decrease temperature

Fine-tune Repetition

If you notice repeated phrases → increase repetitionPenaltyIf speech feels forced → decrease repetitionPenalty

Tweak Sampling (Advanced)

For more control over diversity, adjust topP and topK:

Decrease both for more consistency
Increase both for more variety

Common Issues

Robotic or Monotone Output

Symptoms: Speech sounds flat, lacks emotion or variation Solutions:

Increase temperature to 1.0-1.3
Increase topP to 0.95-0.98
Check voice quality (some voices are more expressive than others)

Repetitive Phrases

Symptoms: Same words or patterns repeat frequently Solutions:

Increase repetitionPenalty to 1.3-1.5
Increase topK to allow more vocabulary diversity
Break long text into shorter segments

Inconsistent Delivery

Symptoms: Tone varies unpredictably between sentences Solutions:

Decrease temperature to 0.5-0.7
Decrease topP to 0.85-0.90
Use consistent punctuation and formatting

Unnatural Phrasing

Symptoms: Speech sounds forced or overly formal Solutions:

Decrease repetitionPenalty to 1.0-1.1
Increase temperature slightly (0.1-0.2 increments)
Ensure input text is naturally written

Best Practices

Make Small Adjustments

Change one parameter at a time by small increments (0.1-0.2) to understand its individual effect.

Test with Representative Content

Use actual content samples from your use case, not generic test phrases.

Consider Voice Characteristics

Different voices respond differently to parameters. What works for a narrator may not work for a character voice.

Document Your Presets

Save successful parameter combinations for different content types in your application.

Balance Consistency and Quality

Perfect consistency isn’t always desirable. Some variation makes speech sound more human.

Technical Details

Sampling Algorithm

Chatterbox uses a combined sampling approach:

Temperature scaling: Divides logits by temperature before softmax
Top-K filtering: Removes all but the K most probable tokens
Top-P filtering: Further filters to nucleus based on cumulative probability
Repetition penalty: Divides logits of previously generated tokens
Sampling: Randomly selects from the filtered distribution

# Simplified pseudocode
logits = model.forward(input)
logits = logits / temperature
logits = apply_repetition_penalty(logits, previous_tokens, repetition_penalty)
logits = top_k_filtering(logits, top_k)
logits = top_p_filtering(logits, top_p)
probs = softmax(logits)
token = sample(probs)

Performance Impact

Parameter changes have minimal performance impact:

Temperature: Negligible (simple division)
Top-P/Top-K: ~1-2ms overhead for filtering
Repetition Penalty: ~1ms per token for lookup

Total parameter processing adds less than 5ms to generation time.

Overview

tRPC Routers

Chatterbox TTS

Generation Parameters

Overview

Parameters Reference

Temperature

Top P (Nucleus Sampling)

How It Works

Top K

Top-K vs Top-P

Repetition Penalty

Example Impact

Use Case Presets

News & Announcements

Conversational Content

Storytelling & Characters

Technical Documentation

Parameter Interactions

Temperature + Top-P

Tuning Workflow

Common Issues

Robotic or Monotone Output

Repetitive Phrases

Inconsistent Delivery

Unnatural Phrasing

Best Practices

Technical Details

Sampling Algorithm

Performance Impact

Build docs developers (and LLMs) love

Overview

tRPC Routers

Chatterbox TTS

​Overview

​Parameters Reference

​Temperature

​Top P (Nucleus Sampling)

​How It Works

​Top K

​Top-K vs Top-P

​Repetition Penalty

​Example Impact

​Use Case Presets

​News & Announcements

​Conversational Content

​Storytelling & Characters

​Technical Documentation

​Parameter Interactions

​Temperature + Top-P

​Tuning Workflow

​Common Issues

​Robotic or Monotone Output

​Repetitive Phrases

​Inconsistent Delivery

​Unnatural Phrasing

​Best Practices

​Technical Details

​Sampling Algorithm

​Performance Impact

Build docs developers (and LLMs) love

Overview

Parameters Reference

Temperature

Top P (Nucleus Sampling)

How It Works

Top K

Top-K vs Top-P

Repetition Penalty

Example Impact

Use Case Presets

News & Announcements

Conversational Content

Storytelling & Characters

Technical Documentation

Parameter Interactions

Temperature + Top-P

Tuning Workflow

Common Issues

Robotic or Monotone Output

Repetitive Phrases

Inconsistent Delivery

Unnatural Phrasing

Best Practices

Technical Details

Sampling Algorithm

Performance Impact