Overview
The Audio Isolation API removes background noise from audio files, isolating the speech or primary audio signal. This is useful for cleaning up recordings, improving audio quality, and preparing audio for further processing.Methods
convert()
Remove background noise from an audio file.The audio file to process. Can be a file path, file object, or bytes. Supports common audio formats including MP3, WAV, M4A, FLAC, and more.
The format of input audio. Options:
pcm_s16le_16- 16-bit PCM at 16kHz sample rate, single channel (mono), little-endian byte order. Provides lower latency compared to encoded formats.other- Any other encoded audio format (default)
pcm_s16le_16, the input audio must match the exact specifications: 16-bit PCM, 16kHz sample rate, mono, little-endian.Optional preview image as base64-encoded string. Used for tracking this generation in analytics and history.
Request-specific configuration. You can pass in configuration such as
chunk_size to customize the request and response behavior.An iterator yielding audio data chunks. Iterate over this to get the complete isolated audio file.
stream()
Stream background noise removal from an audio file.The audio file to process.
The format of input audio:
pcm_s16le_16- 16-bit PCM at 16kHz (lower latency)other- Any other encoded format (default)
Request-specific configuration including chunk_size customization.
An iterator yielding streaming audio data chunks with background noise removed.
Usage Examples
Basic Noise Removal
Streaming Processing
Low-Latency PCM Processing
Async Methods
All methods have async equivalents:Integration with Speech-to-Speech
Audio isolation can be integrated with speech-to-speech conversion:Use Cases
- Podcast production: Remove background noise from recordings
- Call center quality: Clean up customer service recordings
- Interview cleanup: Improve audio quality of recorded interviews
- Content creation: Prepare audio for further processing or editing
- Voice conversion prep: Clean audio before applying speech-to-speech
- Transcription improvement: Remove noise before speech-to-text processing
Technical Details
Supported Input Formats
- MP3, WAV, M4A, FLAC, OGG, OPUS
- PCM (16-bit, 16kHz, mono) for lowest latency
- Most common audio codecs and containers
Processing Notes
- The model is optimized for speech isolation
- Works best with recordings containing human speech
- Background music and ambient sounds are removed
- Processing time depends on audio length
- For real-time applications, use the
stream()method - Use
pcm_s16le_16format for lowest latency