Overview
Speech-to-speech conversion allows you to transform audio from one voice to another while maintaining full control over emotion, timing, and delivery. This is perfect for voice changing, dubbing, and voice conversion applications.Basic Conversion
Convert audio to a different voice:Streaming Conversion
Stream the converted audio in real-time:Parameters
ID of the voice to be used. Use the Get voices endpoint to list all available voices.
The input audio file to convert.
Identifier of the model that will be used. The model needs to have support for speech-to-speech (check the
can_do_voice_conversion property).Output format of the generated audio. Formatted as
codec_sample_rate_bitrate (e.g., mp3_44100_128).Latency optimization level (0-4):
- 0: Default mode (no optimizations)
- 1: Normal optimizations (~50% improvement)
- 2: Strong optimizations (~75% improvement)
- 3: Max optimizations
- 4: Max optimizations with text normalizer off
JSON-encoded string of voice settings to override stored settings.
Seed for deterministic generation (0-4294967295).
Remove background noise from input audio using the audio isolation model.
Format of input audio. Options:
pcm_s16le_16 or other. PCM format offers lower latency.When set to false, zero retention mode is used (enterprise feature).
With Background Noise Removal
Clean up noisy input audio:Low Latency Mode
Optimize for minimal latency:Custom Voice Settings
Override voice settings for the conversion:Deterministic Generation
Use a seed for reproducible results:PCM Input for Lower Latency
Use PCM format for the lowest latency:Async Conversion
Convert audio asynchronously:Output Formats
Supported output formats:mp3_44100_32- MP3 at 44.1kHz, 32kbpsmp3_44100_64- MP3 at 44.1kHz, 64kbpsmp3_44100_96- MP3 at 44.1kHz, 96kbpsmp3_44100_128- MP3 at 44.1kHz, 128kbps (recommended)mp3_44100_192- MP3 at 44.1kHz, 192kbps (Creator tier+)pcm_16000- PCM at 16kHzpcm_22050- PCM at 22.05kHzpcm_24000- PCM at 24kHzpcm_44100- PCM at 44.1kHz (Pro tier+)ulaw_8000- μ-law at 8kHz (Twilio compatible)
Use Cases
Voice Changing
Transform your voice in real-time or recordings
Content Localization
Maintain speaker identity across languages
Voice Preservation
Preserve vocal characteristics while changing content
Accessibility
Convert voices for better accessibility
Best Practices
Related Features
- Voice Cloning - Create custom voices
- Audio Isolation - Remove background noise
- Text to Speech - Generate speech from text