Overview
Source Separation (also called audio source separation or blind source separation) extracts individual audio sources from a mixed recording. This is useful for:- Separating vocals from music
- Isolating speech from background music
- Extracting individual instruments from a mix
- Removing background sounds from recordings
- Audio post-production and remixing
- Enhanced transcription in noisy environments
Installation
Source Separation will be included in the main package:Basic Usage
API Reference
initializeSeparation()
Initialize the Source Separation model.Parameters
Configuration options for separation initialization
Returns
Promise that resolves when separation is initialized.Example
separateSources()
Separate audio sources from a mixed audio file.Parameters
Path to the mixed audio file to separate
Returns
Promise that resolves to an array of separated audio sources.Example
unloadSeparation()
Release separation model resources.Returns
Promise that resolves when resources are released.Example
Types
SeparationInitializeOptions
SeparatedSource
ModelPathConfig
Best Practices
Choose the right separation model
Choose the right separation model
Different models specialize in different separation tasks:
- Speech/Music separation: Separates speech from background music
- Vocal isolation: Extracts vocals from music tracks
- Multi-instrument: Separates individual instruments (drums, bass, etc.)
- General-purpose: Attempts to separate any audio sources
Understand quality limitations
Understand quality limitations
Source separation is an approximation:
- Perfect separation is impossible - expect some artifacts
- Quality depends on source overlap in frequency/time
- Similar-sounding sources are harder to separate
- Processing may introduce “phasey” or “underwater” sounds
Use for specific workflows
Use for specific workflows
Source separation works best in targeted workflows:
- Speech extraction: Isolate speech before transcription
- Noise removal: Separate and discard unwanted sounds
- Karaoke creation: Remove vocals from music
- Stem creation: Extract individual instruments
Common Use Cases
Speech Extraction for Transcription
Vocal Removal (Karaoke)
Multi-Source Analysis
Batch Source Separation
Error Handling
Performance Considerations
Source separation is very computationally intensive:
- Processing time: 0.1x - 2.0x real-time depending on model
- Memory usage: High, scales with audio length and number of sources
- GPU acceleration strongly recommended for practical use
- Consider processing in chunks for very long files
- Output files multiply storage requirements (one per source)
Quality Factors
Separation quality depends on several factors:Source independence
Source independence
- Better: Speech and music (different spectral characteristics)
- Harder: Two similar instruments (similar frequency ranges)
- Hardest: Overlapping speakers (same frequency, same time)
Audio quality
Audio quality
- Clean, high-quality input produces better separation
- Compressed audio (MP3, AAC) may limit separation quality
- Sample rate affects separation resolution
Model selection
Model selection
- Specialized models (speech/music) outperform general models
- Newer models generally have better quality
- Larger models are slower but more accurate
Limitations
Technical Details
Supported Audio Formats
Separation will support common audio formats:- WAV (PCM, 16-bit or 24-bit, 44.1kHz or 48kHz recommended)
- Stereo input generally produces better results than mono
- Additional formats may be added in v0.6.0
Processing Pipeline
- Load audio: Read mixed audio file
- Preprocess: Resample and normalize
- Separation: Apply deep learning source separation
- Postprocess: Normalize and balance output levels
- Save sources: Write each separated source to individual files
Related
Speech Enhancement
Remove noise without full separation
Speech Recognition
Transcribe separated speech
Speaker Diarization
Identify speakers after separation
Voice Activity Detection
Detect speech in separated audio