Function Signature
Parameters
Predicted/enhanced audio signal. Accepted shapes:
- 1D array:
(samples,)- automatically expanded to(1, samples) - 2D array:
(channels, samples)
Ground truth/reference audio signal. Accepted shapes:
- 1D array:
(samples,)- automatically expanded to(1, samples) - 2D array:
(channels, samples)
Length Handling: If signals have different lengths, both are automatically truncated to the shorter length.
Returns
Signal-to-Distortion Ratio in dB (higher is better)
- Measures overall signal quality
- Combines artifacts, interference, and noise
- Typical range: -10 to 30 dB
Signal-to-Interference Ratio in dB (higher is better)
- Measures separation from interfering sources
- Important for source separation tasks
- Typical range: 0 to 40 dB
Signal-to-Artifacts Ratio in dB (higher is better)
- Measures processing artifacts introduced
- Important for enhancement quality
- Typical range: 0 to 30 dB
Scale-Invariant Signal-to-Noise Ratio in dB (higher is better)
- Scale-independent signal quality measure
- Robust to amplitude differences
- Widely used in speech enhancement
Convolutive-Invariant Signal-to-Distortion Ratio in dB (higher is better)
- Invariant to filtering/convolution operations
- Uses 512-tap filter by default
- Better for reverberant conditions
Usage Examples
Basic Usage
Multi-Channel Audio
Different Length Signals
Speech Enhancement Evaluation
Source Separation Evaluation
Metric Details
SDR (Signal-to-Distortion Ratio)
Formula: Ratio of target signal power to distortion power Interpretation:- Measures overall quality including all types of errors
- Higher values indicate better quality
- Combines SIR and SAR information
- Overall system performance
- General quality assessment
- Benchmark comparisons
SIR (Signal-to-Interference Ratio)
Formula: Ratio of target signal to interference from other sources Interpretation:- Specifically measures separation performance
- High SIR means good source isolation
- Low SIR indicates source leakage
- Source separation evaluation
- Multi-speaker scenarios
- Music source separation
SAR (Signal-to-Artifacts Ratio)
Formula: Ratio of target signal to processing artifacts Interpretation:- Measures algorithm-introduced distortions
- High SAR means clean processing
- Low SAR indicates processing artifacts
- Enhancement algorithm quality
- Codec evaluation
- Processing artifact detection
SI-SNR (Scale-Invariant SNR)
Formula: SNR computed after optimal scaling Interpretation:- Invariant to signal amplitude
- Focuses on waveform shape similarity
- Popular in neural network training
- No sensitivity to volume differences
- Differentiable (good for training)
- Robust metric for speech tasks
- Speech enhancement
- Neural network training loss
- Speaker separation
CI-SDR (Convolutive-Invariant SDR)
Formula: SDR after optimal filtering (512-tap filter) Interpretation:- Invariant to linear filtering/convolution
- Handles room acoustics effects
- More robust than SDR in reverb
- Reverberant environments
- Far-field speech processing
- Acoustic echo cancellation
Interpretation Guidelines
Quality Ranges
| Metric | Excellent | Good | Fair | Poor |
|---|---|---|---|---|
| SDR | > 18 dB | 12-18 dB | 6-12 dB | < 6 dB |
| SIR | > 20 dB | 15-20 dB | 10-15 dB | < 10 dB |
| SAR | > 15 dB | 10-15 dB | 5-10 dB | < 5 dB |
| SI-SNR | > 15 dB | 10-15 dB | 5-10 dB | < 5 dB |
| CI-SDR | > 15 dB | 10-15 dB | 5-10 dB | < 5 dB |
Task-Specific Benchmarks
Speech Enhancement:- Target SI-SNR improvement: 10-20 dB
- Typical SDR: 10-18 dB
- SAR importance: High
- Good SDR: 6-12 dB (vocals), 8-14 dB (accompaniment)
- SIR importance: Critical
- State-of-the-art: 8-10 dB SDR
- Target SI-SNR: 12-18 dB
- SIR importance: Very high
- CI-SDR useful for reverberant conditions
Implementation Details
Backends
- SDR, SIR, SAR: Uses
mir_eval.separation.bss_eval_sources - SI-SNR: Uses
fast_bss_eval.si_sdr_loss - CI-SDR: Uses
ci_sdr.pt.ci_sdr_loss
Computation Parameters
- Permutation:
compute_permutation=False(assumes correct channel order) - CI-SDR filter length: 512 taps (default)
- SI-SNR zero mean: Controlled by backend default
Dependencies
Use Cases
- Speech Enhancement: Noise reduction, dereverberation
- Source Separation: Music, speech, environmental sounds
- Echo Cancellation: Acoustic echo removal
- Beamforming: Multi-microphone processing
- Codec Evaluation: Audio compression quality
- Model Training: Loss functions and validation
Notes
Channel Ordering: Assumes predicted and reference channels are aligned. Enable permutation if channel order is unknown.
Negative Values: Negative dB values indicate predicted signal is worse than the input mixture.