Watermarking & Responsible AI

Every audio file generated by Chatterbox includes imperceptible neural watermarks using Resemble AI’s PerTh (Perceptual Threshold) watermarking technology. This ensures AI-generated content can be identified and traced while maintaining audio quality.

What is PerTh Watermarking?

PerTh is a neural watermarking system that embeds imperceptible markers into audio during generation. These watermarks:

Are inaudible to human listeners
Survive MP3 compression and format conversion
Persist through audio editing and manipulation
Maintain nearly 100% detection accuracy
Don’t degrade audio quality

Automatic Application: Watermarking is applied automatically to all Chatterbox-generated audio. You don’t need to enable it—every output is watermarked by default.

Why Watermarking Matters

AI-generated audio can be misused for:

Impersonation and fraud
Spreading misinformation
Creating deepfakes
Unauthorized voice cloning

Watermarking enables:

Detection of AI-generated content
Verification of audio authenticity
Attribution to Chatterbox TTS
Accountability in content creation

Ethical Responsibility: Just because you can clone any voice doesn’t mean you should. Always obtain proper consent before cloning someone’s voice, and use the technology responsibly.

How Watermarking Works

The watermark is embedded during audio generation:

from chatterbox.tts_turbo import ChatterboxTurboTTS
import torchaudio as ta

# Generate audio
model = ChatterboxTurboTTS.from_pretrained(device="cuda")
text = "This audio will be automatically watermarked."
wav = model.generate(text)

# The watermark is already embedded
ta.save("watermarked_output.wav", wav, model.sr)

The watermarking happens inside the generate() method:

# From tts_turbo.py (line 295)
wav = wav.squeeze(0).detach().cpu().numpy()
watermarked_wav = self.watermarker.apply_watermark(wav, sample_rate=self.sr)
return torch.from_numpy(watermarked_wav).unsqueeze(0)

Every model variant (Turbo, Standard, Multilingual) applies watermarking automatically.

Detecting Watermarks

You can check if audio contains a Chatterbox watermark using the PerTh library.

Install PerTh

PerTh is included as a dependency with Chatterbox, but you can install it separately:

pip install perth

Load the audio file

Use librosa to load the audio you want to check:

import librosa
import perth

AUDIO_PATH = "suspected_file.wav"
watermarked_audio, sr = librosa.load(AUDIO_PATH, sr=None)

Initialize the watermarker

Create a watermarker instance (same as used for embedding):

watermarker = perth.PerthImplicitWatermarker()

Extract the watermark

Check for the watermark presence:

watermark = watermarker.get_watermark(watermarked_audio, sample_rate=sr)
print(f"Extracted watermark: {watermark}")
# Output: 0.0 (no watermark) or 1.0 (watermarked)

Complete Detection Example

Here’s the full detection script from the README:

import perth
import librosa

AUDIO_PATH = "YOUR_FILE.wav"

# Load the watermarked audio
watermarked_audio, sr = librosa.load(AUDIO_PATH, sr=None)

# Initialize watermarker (same as used for embedding)
watermarker = perth.PerthImplicitWatermarker()

# Extract watermark
watermark = watermarker.get_watermark(watermarked_audio, sample_rate=sr)
print(f"Extracted watermark: {watermark}")
# Output: 0.0 (no watermark) or 1.0 (watermarked)

From README.md (lines 126-142)

Detection Accuracy

PerTh watermarks maintain nearly 100% detection accuracy even after:

Format Conversion

# Original watermarked WAV
import torchaudio as ta
wav = model.generate("Test audio")
ta.save("original.wav", wav, model.sr)

# Convert to MP3 and back - watermark persists
import subprocess
subprocess.run(["ffmpeg", "-i", "original.wav", "compressed.mp3"])
subprocess.run(["ffmpeg", "-i", "compressed.mp3", "restored.wav"])

# Watermark still detectable in restored.wav

Audio Editing

Volume adjustments
Normalization
Equalization
Trimming/cutting
Concatenation

Common Manipulations

Speed changes (within reason)
Pitch shifting
Adding background music
Adding effects (reverb, echo, etc.)

Robustness: While PerTh watermarks are very robust, extreme audio degradation (heavy distortion, very low bitrates, multiple re-encodings) may reduce detection reliability.

Responsible AI Practices

Do:

✓ Obtain consent before cloning someone’s voice ✓ Disclose when audio is AI-generated ✓ Respect privacy and intellectual property rights ✓ Use for legitimate purposes (accessibility, localization, etc.) ✓ Verify authenticity of audio when in doubt ✓ Keep watermarking enabled (it’s automatic)

Don’t:

✗ Clone voices without permission ✗ Create misleading or deceptive content ✗ Impersonate others for fraud or harm ✗ Use for harassment or abuse ✗ Spread misinformation or disinformation ✗ Attempt to remove watermarks

Legal Implications: Unauthorized voice cloning and misuse of AI-generated audio may violate laws regarding fraud, impersonation, copyright, and privacy. Always consult legal counsel if you’re unsure about your use case.

Legitimate Use Cases

Chatterbox and voice cloning technology have many beneficial applications:

Accessibility

Text-to-speech for visually impaired users
Voice restoration for people who have lost their voice
Personalized assistive technology

# Example: Personalized screen reader
text = "You have 3 new messages."
wav = model.generate(text, audio_prompt_path="user_preferred_voice.wav")

Content Creation

Audiobook narration
Video voiceovers
Podcast production
Game character voices (with actor consent)

Localization & Translation

Dubbing content into multiple languages
Maintaining voice consistency across translations
Global content accessibility

# Example: Multilingual content with same voice
for lang, text in translations.items():
    wav = multilingual_model.generate(
        text,
        language_id=lang,
        audio_prompt_path="original_speaker.wav"
    )
    save_output(f"output_{lang}.wav", wav)

Voice Agents & Assistants

Customer service bots
Virtual assistants
Interactive voice response (IVR)
Conversational AI

# Example: Friendly customer service agent
text = "Hi there! [chuckle] How can I help you today?"
wav = model.generate(text, audio_prompt_path="agent_voice.wav")

Watermark Verification Service

For applications requiring verified detection, consider implementing a verification service:

import perth
import librosa
from pathlib import Path

def verify_chatterbox_audio(audio_path: str) -> dict:
    """
    Verify if audio was generated by Chatterbox TTS.
    
    Returns:
        dict with keys: 'watermarked', 'confidence', 'path'
    """
    # Load audio
    audio, sr = librosa.load(audio_path, sr=None)
    
    # Check watermark
    watermarker = perth.PerthImplicitWatermarker()
    watermark = watermarker.get_watermark(audio, sample_rate=sr)
    
    return {
        'watermarked': bool(watermark > 0.5),
        'confidence': float(watermark),
        'path': Path(audio_path).name
    }

# Usage
result = verify_chatterbox_audio("suspect_audio.wav")
if result['watermarked']:
    print(f"✓ Audio generated by Chatterbox (confidence: {result['confidence']})")
else:
    print(f"✗ No Chatterbox watermark detected")

Watermarking Across Models

All three Chatterbox models use the same watermarking system:

# All models automatically watermark output

# Turbo
turbo_model = ChatterboxTurboTTS.from_pretrained(device="cuda")
wav = turbo_model.generate("Test")  # Watermarked

# Standard  
standard_model = ChatterboxTTS.from_pretrained(device="cuda")
wav = standard_model.generate("Test")  # Watermarked

# Multilingual
mtl_model = ChatterboxMultilingualTTS.from_pretrained(device="cuda")
wav = mtl_model.generate("Test", language_id="en")  # Watermarked

The watermarking implementation is identical across all models (from tts.py line 271, tts_turbo.py line 295, mtl_tts.py line 300).

Technical Details

Watermark Properties

Type: Implicit neural watermark
Perceptibility: Inaudible to humans
Robustness: Survives compression and editing
Output: Binary (0.0 = no watermark, 1.0 = watermarked)
Detection: Real-time capable

Implementation

The watermarker is initialized once per model instance:

# From tts_turbo.py (line 130)
self.watermarker = perth.PerthImplicitWatermarker()

And applied to every generated audio:

# Convert to numpy for watermarking
wav = wav.squeeze(0).detach().cpu().numpy()

# Apply watermark
watermarked_wav = self.watermarker.apply_watermark(wav, sample_rate=self.sr)

# Convert back to tensor
return torch.from_numpy(watermarked_wav).unsqueeze(0)

Additional Resources

PerTh GitHub: github.com/resemble-ai/perth
Resemble AI Ethics: Learn more about ethical AI audio at resemble.ai
Chatterbox Repository: github.com/resemble-ai/chatterbox

Disclaimer

From the Chatterbox README:

Don’t use this model to do bad things. Prompts are sourced from freely available data on the internet.

Important: The watermarking technology helps identify AI-generated audio but doesn’t prevent misuse. Users are solely responsible for ensuring their use of Chatterbox complies with applicable laws and respects others’ rights.

Detecting Non-Watermarked Audio

If you encounter audio that might be from Chatterbox but shows no watermark:

Check for tampering - The audio may have been modified to remove watermarks
Verify the source - Ensure the audio came from a Chatterbox model
Consider degradation - Extreme processing might reduce detection accuracy
Report suspicious activity - Contact Resemble AI if you suspect watermark removal attempts

Summary

Chatterbox’s built-in PerTh watermarking provides:

✅ Automatic, imperceptible watermarks on all generated audio
✅ Nearly 100% detection accuracy
✅ Robustness against compression and editing
✅ Simple detection API
✅ Support for responsible AI practices

Use this technology ethically and responsibly. Always obtain consent, disclose AI generation, and respect others’ rights.

Get Started

Models

Guides

Watermarking & Responsible AI

What is PerTh Watermarking?

Why Watermarking Matters

How Watermarking Works

Detecting Watermarks

Complete Detection Example

Detection Accuracy

Format Conversion

Audio Editing

Common Manipulations

Responsible AI Practices

Do:

Don’t:

Legitimate Use Cases

Accessibility

Content Creation

Localization & Translation

Voice Agents & Assistants

Watermark Verification Service

Watermarking Across Models

Technical Details

Watermark Properties

Implementation

Additional Resources

Disclaimer

Detecting Non-Watermarked Audio

Summary

Build docs developers (and LLMs) love

Get Started

Models

Guides

​What is PerTh Watermarking?

​Why Watermarking Matters

​How Watermarking Works

​Detecting Watermarks

​Complete Detection Example

​Detection Accuracy

​Format Conversion

​Audio Editing

​Common Manipulations

​Responsible AI Practices

​Do:

​Don’t:

​Legitimate Use Cases

​Accessibility

​Content Creation

​Localization & Translation

​Voice Agents & Assistants

​Watermark Verification Service

​Watermarking Across Models

​Technical Details

​Watermark Properties

​Implementation

​Additional Resources

​Disclaimer

​Detecting Non-Watermarked Audio

​Summary

Build docs developers (and LLMs) love

What is PerTh Watermarking?

Why Watermarking Matters

How Watermarking Works

Detecting Watermarks

Complete Detection Example

Detection Accuracy

Format Conversion

Audio Editing

Common Manipulations

Responsible AI Practices

Do:

Don’t:

Legitimate Use Cases

Accessibility

Content Creation

Localization & Translation

Voice Agents & Assistants

Watermark Verification Service

Watermarking Across Models

Technical Details

Watermark Properties

Implementation

Additional Resources

Disclaimer

Detecting Non-Watermarked Audio

Summary