Every audio file generated by Chatterbox includes imperceptible neural watermarks using Resemble AI’s PerTh (Perceptual Threshold) watermarking technology. This ensures AI-generated content can be identified and traced while maintaining audio quality.
What is PerTh Watermarking?
PerTh is a neural watermarking system that embeds imperceptible markers into audio during generation. These watermarks:
- Are inaudible to human listeners
- Survive MP3 compression and format conversion
- Persist through audio editing and manipulation
- Maintain nearly 100% detection accuracy
- Don’t degrade audio quality
Automatic Application: Watermarking is applied automatically to all Chatterbox-generated audio. You don’t need to enable it—every output is watermarked by default.
Why Watermarking Matters
AI-generated audio can be misused for:
- Impersonation and fraud
- Spreading misinformation
- Creating deepfakes
- Unauthorized voice cloning
Watermarking enables:
- Detection of AI-generated content
- Verification of audio authenticity
- Attribution to Chatterbox TTS
- Accountability in content creation
Ethical Responsibility: Just because you can clone any voice doesn’t mean you should. Always obtain proper consent before cloning someone’s voice, and use the technology responsibly.
How Watermarking Works
The watermark is embedded during audio generation:
from chatterbox.tts_turbo import ChatterboxTurboTTS
import torchaudio as ta
# Generate audio
model = ChatterboxTurboTTS.from_pretrained(device="cuda")
text = "This audio will be automatically watermarked."
wav = model.generate(text)
# The watermark is already embedded
ta.save("watermarked_output.wav", wav, model.sr)
The watermarking happens inside the generate() method:
# From tts_turbo.py (line 295)
wav = wav.squeeze(0).detach().cpu().numpy()
watermarked_wav = self.watermarker.apply_watermark(wav, sample_rate=self.sr)
return torch.from_numpy(watermarked_wav).unsqueeze(0)
Every model variant (Turbo, Standard, Multilingual) applies watermarking automatically.
Detecting Watermarks
You can check if audio contains a Chatterbox watermark using the PerTh library.
Install PerTh
PerTh is included as a dependency with Chatterbox, but you can install it separately: Load the audio file
Use librosa to load the audio you want to check:import librosa
import perth
AUDIO_PATH = "suspected_file.wav"
watermarked_audio, sr = librosa.load(AUDIO_PATH, sr=None)
Initialize the watermarker
Create a watermarker instance (same as used for embedding):watermarker = perth.PerthImplicitWatermarker()
Extract the watermark
Check for the watermark presence:watermark = watermarker.get_watermark(watermarked_audio, sample_rate=sr)
print(f"Extracted watermark: {watermark}")
# Output: 0.0 (no watermark) or 1.0 (watermarked)
Complete Detection Example
Here’s the full detection script from the README:
import perth
import librosa
AUDIO_PATH = "YOUR_FILE.wav"
# Load the watermarked audio
watermarked_audio, sr = librosa.load(AUDIO_PATH, sr=None)
# Initialize watermarker (same as used for embedding)
watermarker = perth.PerthImplicitWatermarker()
# Extract watermark
watermark = watermarker.get_watermark(watermarked_audio, sample_rate=sr)
print(f"Extracted watermark: {watermark}")
# Output: 0.0 (no watermark) or 1.0 (watermarked)
From README.md (lines 126-142)
Detection Accuracy
PerTh watermarks maintain nearly 100% detection accuracy even after:
# Original watermarked WAV
import torchaudio as ta
wav = model.generate("Test audio")
ta.save("original.wav", wav, model.sr)
# Convert to MP3 and back - watermark persists
import subprocess
subprocess.run(["ffmpeg", "-i", "original.wav", "compressed.mp3"])
subprocess.run(["ffmpeg", "-i", "compressed.mp3", "restored.wav"])
# Watermark still detectable in restored.wav
Audio Editing
- Volume adjustments
- Normalization
- Equalization
- Trimming/cutting
- Concatenation
Common Manipulations
- Speed changes (within reason)
- Pitch shifting
- Adding background music
- Adding effects (reverb, echo, etc.)
Robustness: While PerTh watermarks are very robust, extreme audio degradation (heavy distortion, very low bitrates, multiple re-encodings) may reduce detection reliability.
Responsible AI Practices
Do:
✓ Obtain consent before cloning someone’s voice
✓ Disclose when audio is AI-generated
✓ Respect privacy and intellectual property rights
✓ Use for legitimate purposes (accessibility, localization, etc.)
✓ Verify authenticity of audio when in doubt
✓ Keep watermarking enabled (it’s automatic)
Don’t:
✗ Clone voices without permission
✗ Create misleading or deceptive content
✗ Impersonate others for fraud or harm
✗ Use for harassment or abuse
✗ Spread misinformation or disinformation
✗ Attempt to remove watermarks
Legal Implications: Unauthorized voice cloning and misuse of AI-generated audio may violate laws regarding fraud, impersonation, copyright, and privacy. Always consult legal counsel if you’re unsure about your use case.
Legitimate Use Cases
Chatterbox and voice cloning technology have many beneficial applications:
Accessibility
- Text-to-speech for visually impaired users
- Voice restoration for people who have lost their voice
- Personalized assistive technology
# Example: Personalized screen reader
text = "You have 3 new messages."
wav = model.generate(text, audio_prompt_path="user_preferred_voice.wav")
Content Creation
- Audiobook narration
- Video voiceovers
- Podcast production
- Game character voices (with actor consent)
Localization & Translation
- Dubbing content into multiple languages
- Maintaining voice consistency across translations
- Global content accessibility
# Example: Multilingual content with same voice
for lang, text in translations.items():
wav = multilingual_model.generate(
text,
language_id=lang,
audio_prompt_path="original_speaker.wav"
)
save_output(f"output_{lang}.wav", wav)
Voice Agents & Assistants
- Customer service bots
- Virtual assistants
- Interactive voice response (IVR)
- Conversational AI
# Example: Friendly customer service agent
text = "Hi there! [chuckle] How can I help you today?"
wav = model.generate(text, audio_prompt_path="agent_voice.wav")
Watermark Verification Service
For applications requiring verified detection, consider implementing a verification service:
import perth
import librosa
from pathlib import Path
def verify_chatterbox_audio(audio_path: str) -> dict:
"""
Verify if audio was generated by Chatterbox TTS.
Returns:
dict with keys: 'watermarked', 'confidence', 'path'
"""
# Load audio
audio, sr = librosa.load(audio_path, sr=None)
# Check watermark
watermarker = perth.PerthImplicitWatermarker()
watermark = watermarker.get_watermark(audio, sample_rate=sr)
return {
'watermarked': bool(watermark > 0.5),
'confidence': float(watermark),
'path': Path(audio_path).name
}
# Usage
result = verify_chatterbox_audio("suspect_audio.wav")
if result['watermarked']:
print(f"✓ Audio generated by Chatterbox (confidence: {result['confidence']})")
else:
print(f"✗ No Chatterbox watermark detected")
Watermarking Across Models
All three Chatterbox models use the same watermarking system:
# All models automatically watermark output
# Turbo
turbo_model = ChatterboxTurboTTS.from_pretrained(device="cuda")
wav = turbo_model.generate("Test") # Watermarked
# Standard
standard_model = ChatterboxTTS.from_pretrained(device="cuda")
wav = standard_model.generate("Test") # Watermarked
# Multilingual
mtl_model = ChatterboxMultilingualTTS.from_pretrained(device="cuda")
wav = mtl_model.generate("Test", language_id="en") # Watermarked
The watermarking implementation is identical across all models (from tts.py line 271, tts_turbo.py line 295, mtl_tts.py line 300).
Technical Details
Watermark Properties
- Type: Implicit neural watermark
- Perceptibility: Inaudible to humans
- Robustness: Survives compression and editing
- Output: Binary (0.0 = no watermark, 1.0 = watermarked)
- Detection: Real-time capable
Implementation
The watermarker is initialized once per model instance:
# From tts_turbo.py (line 130)
self.watermarker = perth.PerthImplicitWatermarker()
And applied to every generated audio:
# Convert to numpy for watermarking
wav = wav.squeeze(0).detach().cpu().numpy()
# Apply watermark
watermarked_wav = self.watermarker.apply_watermark(wav, sample_rate=self.sr)
# Convert back to tensor
return torch.from_numpy(watermarked_wav).unsqueeze(0)
Additional Resources
Disclaimer
From the Chatterbox README:
Don’t use this model to do bad things. Prompts are sourced from freely available data on the internet.
Important: The watermarking technology helps identify AI-generated audio but doesn’t prevent misuse. Users are solely responsible for ensuring their use of Chatterbox complies with applicable laws and respects others’ rights.
Detecting Non-Watermarked Audio
If you encounter audio that might be from Chatterbox but shows no watermark:
- Check for tampering - The audio may have been modified to remove watermarks
- Verify the source - Ensure the audio came from a Chatterbox model
- Consider degradation - Extreme processing might reduce detection accuracy
- Report suspicious activity - Contact Resemble AI if you suspect watermark removal attempts
Summary
Chatterbox’s built-in PerTh watermarking provides:
- ✅ Automatic, imperceptible watermarks on all generated audio
- ✅ Nearly 100% detection accuracy
- ✅ Robustness against compression and editing
- ✅ Simple detection API
- ✅ Support for responsible AI practices
Use this technology ethically and responsibly. Always obtain consent, disclose AI generation, and respect others’ rights.