ChatterboxVC

Overview

ChatterboxVC enables high-quality voice conversion, transforming the voice characteristics of input audio while preserving the linguistic content. This allows you to change the speaker identity of existing recordings.

Class Signature

class ChatterboxVC:
    def __init__(
        self,
        s3gen: S3Gen,
        device: str,
        ref_dict: dict = None,
    )

Parameters

s3gen

S3Gen

required

The S3Gen vocoder model instance for audio conversion

device

str

required

Device to run inference on (“cuda”, “cpu”, or “mps”)

ref_dict

dict

Optional pre-computed reference voice embeddings dictionary

Class Methods

from_pretrained()

Load the pre-trained ChatterboxVC model from Hugging Face.

@classmethod
def from_pretrained(cls, device: str) -> 'ChatterboxVC'

Parameters

device

str

required

Device to load the model on (“cuda”, “cpu”, or “mps”). Automatically falls back to “cpu” if MPS is not available

Returns

model

ChatterboxVC

Initialized ChatterboxVC model with pre-trained weights from ResembleAI/chatterbox

Example

from chatterbox import ChatterboxVC
import torch

# Load on GPU
device = "cuda" if torch.cuda.is_available() else "cpu"
vc_model = ChatterboxVC.from_pretrained(device)

from_local()

Load the model from a local checkpoint directory.

@classmethod
def from_local(cls, ckpt_dir: str, device: str) -> 'ChatterboxVC'

Parameters

ckpt_dir

str

required

Path to the directory containing model checkpoint files

device

str

required

Device to load the model on (“cuda”, “cpu”, or “mps”)

Returns

model

ChatterboxVC

Initialized ChatterboxVC model with weights loaded from local directory

Instance Methods

set_target_voice()

Set the target voice for conversion from an audio file.

def set_target_voice(self, wav_fpath: str)

Parameters

wav_fpath

str

required

Path to the audio file containing the target voice to convert to

Example

# Set the target voice
vc_model.set_target_voice("target_speaker.wav")

generate()

Convert the voice in the input audio to the target voice.

def generate(
    self,
    audio: str,
    target_voice_path: str = None,
) -> torch.Tensor

Parameters

audio

str

required

Path to the audio file to convert

target_voice_path

str

Optional path to target voice audio file. If provided, will override the existing target voice

Returns

audio

torch.Tensor

Converted audio waveform as a PyTorch tensor with shape [1, samples]. Sample rate is 44100 Hz (accessible via vc_model.sr). Audio includes perceptual watermarking

Example

import torchaudio
from chatterbox import ChatterboxVC

device = "cuda"
vc_model = ChatterboxVC.from_pretrained(device)

# Method 1: Set target voice, then convert
vc_model.set_target_voice("target_speaker.wav")
converted_audio = vc_model.generate("source_audio.wav")
torchaudio.save("converted_output.wav", converted_audio, vc_model.sr)

# Method 2: Convert with target voice in one call
converted_audio = vc_model.generate(
    audio="source_audio.wav",
    target_voice_path="target_speaker.wav"
)
torchaudio.save("converted_output.wav", converted_audio, vc_model.sr)

Attributes

int

Sample rate of generated audio (44100 Hz)

device

str

Device the model is running on

ref_dict

dict

Current target voice embeddings used for conversion

Notes

Voice conversion preserves the linguistic content and prosody while changing voice characteristics
The model internally tokenizes the source audio at 16kHz before conversion
Generated audio is automatically watermarked using the Perth implicit watermarker
Both source and target audio are automatically resampled to the correct sample rates
You must either call set_target_voice() first or provide target_voice_path to generate()

Core Classes

Utilities

Overview

Class Signature

Parameters

Class Methods

from_pretrained()

Parameters

Returns

Example

from_local()

Parameters

Returns

Instance Methods

set_target_voice()

Parameters

Example

generate()

Parameters

Returns

Example

Attributes

Notes

Build docs developers (and LLMs) love

Core Classes

Utilities

​Overview

​Class Signature

​Parameters

​Class Methods

​from_pretrained()

​Parameters

​Returns

​Example

​from_local()

​Parameters

​Returns

​Instance Methods

​set_target_voice()

​Parameters

​Example

​generate()

​Parameters

​Returns

​Example

​Attributes

​Notes

Build docs developers (and LLMs) love

Overview

Class Signature

Parameters

Class Methods

from_pretrained()

Parameters

Returns

Example

from_local()

Parameters

Returns

Instance Methods

set_target_voice()

Parameters

Example

generate()

Parameters

Returns

Example

Attributes

Notes