Metrics marked with Auto-Install are automatically installed with VERSA. Others require manual installation from their respective code sources.
Voice Conversion Metrics
Metrics for evaluating voice conversion and voice cloning systems.Signal Quality Metrics
Standard signal-to-noise and distortion metrics for speech enhancement and separation.| Number | Auto-Install | Metric Name | Key in Config | Key in Report | Code Source | References |
|---|---|---|---|---|---|---|
| 4 | ✓ | Signal-to-interference Ratio (SIR) | signal_metric | sir | espnet | - |
| 5 | ✓ | Signal-to-artifact Ratio (SAR) | signal_metric | sar | espnet | - |
| 6 | ✓ | Signal-to-distortion Ratio (SDR) | signal_metric | sdr | espnet | - |
| 7 | ✓ | Convolutional scale-invariant signal-to-distortion ratio (CI-SDR) | signal_metric | ci-sdr | ci_sdr | paper |
| 8 | ✓ | Scale-invariant signal-to-noise ratio (SI-SNR) | signal_metric | si-snr | espnet | paper |
Perceptual Quality Metrics
Standard perceptual metrics for speech quality and intelligibility.Discrete Speech Metrics
Metrics for evaluating discrete speech representations and codec quality.| Number | Auto-Install | Metric Name | Key in Config | Key in Report | Code Source | References |
|---|---|---|---|---|---|---|
| 11 | ✓ | Speech BERT Score | discrete_speech | speech_bert | discrete speech metric | paper |
| 12 | ✓ | Discrete Speech BLEU Score | discrete_speech | speech_belu | discrete speech metric | paper |
| 13 | ✓ | Discrete Speech Token Edit Distance | discrete_speech | speech_token_distance | discrete speech metric | paper |
Advanced Comparison Metrics
| Number | Auto-Install | Metric Name | Key in Config | Key in Report | Code Source | References |
|---|---|---|---|---|---|---|
| 14 | Dynamic Time Warping Cost Metric | warpq | warpq | WARP-Q | paper | |
| 15 | Speech Contrastive Regression for Quality Assessment with reference (ScoreQ) | scoreq_ref | scoreq_ref | ScoreQ | paper | |
| 17 | ✓ | Log-Weighted Mean Square Error | log_wmse | log_wmse | log_wmse | |
| 18 | ✓ | ASR-oriented Mismatch Error Rate (ASR-Mismatch) | asr_match | asr_match_error_rate | - | - |
PYSEPM Metrics
Metrics from the Python Speech Enhancement Performance Measures (PYSEPM) library.| Number | Auto-Install | Metric Name | Key in Config | Key in Report | Code Source | References |
|---|---|---|---|---|---|---|
| 20 | Frequency-Weighted SEGmental SNR (FWSEGSNR) | pysepm | pysepm_fwsegsnr | pysepm | Paper | |
| 21 | Weighted Spectral Slope (WSS) | pysepm | pysepm_wss | pysepm | Paper | |
| 22 | Cepstrum Distance Objective Speech Quality Measure (CD) | pysepm | pysepm_cd | pysepm | Paper | |
| 23 | Composite Objective Speech Quality | pysepm | pysepm_Csig, pysepm_Cbak, pysepm_Covl | pysepm | Paper | |
| 24 | Coherence and speech intelligibility index (CSII) | pysepm | pysepm_csii_high, pysepm_csii_mid, pysepm_csii_low | pysepm | Paper | |
| 25 | Normalized-covariance measure (NCM) | pysepm | pysepm_ncm | pysepm | Paper |
Unified Models with Audio Reference
| Number | Auto-Install | Metric Name | Key in Config | Key in Report | Code Source | References |
|---|---|---|---|---|---|---|
| 26 | Uni-VERSA (Versatile Speech Assessment with a Unified Framework) with Audio Reference | universa_audioref | universa_score | Uni-VERSA | paper | |
| 27 | ARECHO (Audio Reference Echo Cancellation and Codec Quality Assessment) with Audio Reference | arecho_audioref | arecho_score | ARECHO | paper |
Music & Audio Alignment
| Number | Auto-Install | Metric Name | Key in Config | Key in Report | Code Source | References |
|---|---|---|---|---|---|---|
| 28 | ✓ | Chroma-related Alignment | chroma_alignment | chroma_stft_cosine_dtw, chroma_cqt_cosine_dtw, chroma_cens_cosine_dtw, etc. | - | - |
Perceptual Audio Metrics
Deep learning-based perceptual similarity metrics.| Number | Auto-Install | Metric Name | Key in Config | Key in Report | Code Source | References |
|---|---|---|---|---|---|---|
| 29 | ✓ | Deep Perceptual Audio Metric (DPAM) | dpam | dpam_distance | PerceptualAudio_Pytorch | paper |
| 30 | ✓ | Contrastive learning-based Deep Perceptual Audio Metric (CDPAM) | cdpam | cdpam_distance | PerceptualAudio | paper |
All metrics listed above require a reference audio signal for comparison. Choose the appropriate metric based on your evaluation scenario: voice conversion, speech enhancement, or general audio quality assessment.
Usage Guidelines
Speech Enhancement
Speech Enhancement
For speech enhancement tasks, use:
- PESQ and STOI for perceptual quality and intelligibility
- SI-SNR, SDR, SAR for signal-level metrics
- VISQOL for high-quality perceptual assessment
Voice Conversion
Voice Conversion
For voice conversion tasks, use:
- MCD to measure spectral distortion
- F0 Correlation and F0 RMSE to evaluate pitch accuracy
Codec Evaluation
Codec Evaluation
For audio codec quality, use:
- Discrete Speech BERT Score and BLEU Score
- Speech Token Edit Distance
- Log-Weighted Mean Square Error
Music & Singing
Music & Singing
For music and singing voice synthesis, use:
- Chroma Alignment metrics for pitch and harmony
- DPAM and CDPAM for perceptual similarity