Skip to main content
Dependent metrics require a reference audio signal to evaluate quality. These metrics compare the generated or processed audio against a clean reference to measure distortion, similarity, and reconstruction quality.
Metrics marked with Auto-Install are automatically installed with VERSA. Others require manual installation from their respective code sources.

Voice Conversion Metrics

Metrics for evaluating voice conversion and voice cloning systems.
NumberAuto-InstallMetric NameKey in ConfigKey in ReportCode SourceReferences
1Mel Cepstral Distortion (MCD)mcd_f0mcdespnet and s3prl-vcpaper
2F0 Correlationmcd_f0f0_correspnet and s3prl-vcpaper
3F0 Root Mean Square Errormcd_f0f0_rmseespnet and s3prl-vcpaper

Signal Quality Metrics

Standard signal-to-noise and distortion metrics for speech enhancement and separation.
NumberAuto-InstallMetric NameKey in ConfigKey in ReportCode SourceReferences
4Signal-to-interference Ratio (SIR)signal_metricsirespnet-
5Signal-to-artifact Ratio (SAR)signal_metricsarespnet-
6Signal-to-distortion Ratio (SDR)signal_metricsdrespnet-
7Convolutional scale-invariant signal-to-distortion ratio (CI-SDR)signal_metricci-sdrci_sdrpaper
8Scale-invariant signal-to-noise ratio (SI-SNR)signal_metricsi-snrespnetpaper

Perceptual Quality Metrics

Standard perceptual metrics for speech quality and intelligibility.
NumberAuto-InstallMetric NameKey in ConfigKey in ReportCode SourceReferences
9Perceptual Evaluation of Speech Quality (PESQ)pesqpesqpesqpaper
10Short-Time Objective Intelligibility (STOI)stoistoipystoipaper
19Virtual Speech Quality Objective Listener (VISQOL)visqolvisqolgoogle-visqolpaper

Discrete Speech Metrics

Metrics for evaluating discrete speech representations and codec quality.
NumberAuto-InstallMetric NameKey in ConfigKey in ReportCode SourceReferences
11Speech BERT Scorediscrete_speechspeech_bertdiscrete speech metricpaper
12Discrete Speech BLEU Scorediscrete_speechspeech_beludiscrete speech metricpaper
13Discrete Speech Token Edit Distancediscrete_speechspeech_token_distancediscrete speech metricpaper

Advanced Comparison Metrics

NumberAuto-InstallMetric NameKey in ConfigKey in ReportCode SourceReferences
14Dynamic Time Warping Cost MetricwarpqwarpqWARP-Qpaper
15Speech Contrastive Regression for Quality Assessment with reference (ScoreQ)scoreq_refscoreq_refScoreQpaper
17Log-Weighted Mean Square Errorlog_wmselog_wmselog_wmse
18ASR-oriented Mismatch Error Rate (ASR-Mismatch)asr_matchasr_match_error_rate--

PYSEPM Metrics

Metrics from the Python Speech Enhancement Performance Measures (PYSEPM) library.
NumberAuto-InstallMetric NameKey in ConfigKey in ReportCode SourceReferences
20Frequency-Weighted SEGmental SNR (FWSEGSNR)pysepmpysepm_fwsegsnrpysepmPaper
21Weighted Spectral Slope (WSS)pysepmpysepm_wsspysepmPaper
22Cepstrum Distance Objective Speech Quality Measure (CD)pysepmpysepm_cdpysepmPaper
23Composite Objective Speech Qualitypysepmpysepm_Csig, pysepm_Cbak, pysepm_CovlpysepmPaper
24Coherence and speech intelligibility index (CSII)pysepmpysepm_csii_high, pysepm_csii_mid, pysepm_csii_lowpysepmPaper
25Normalized-covariance measure (NCM)pysepmpysepm_ncmpysepmPaper

Unified Models with Audio Reference

NumberAuto-InstallMetric NameKey in ConfigKey in ReportCode SourceReferences
26Uni-VERSA (Versatile Speech Assessment with a Unified Framework) with Audio Referenceuniversa_audiorefuniversa_scoreUni-VERSApaper
27ARECHO (Audio Reference Echo Cancellation and Codec Quality Assessment) with Audio Referencearecho_audiorefarecho_scoreARECHOpaper

Music & Audio Alignment

NumberAuto-InstallMetric NameKey in ConfigKey in ReportCode SourceReferences
28Chroma-related Alignmentchroma_alignmentchroma_stft_cosine_dtw, chroma_cqt_cosine_dtw, chroma_cens_cosine_dtw, etc.--

Perceptual Audio Metrics

Deep learning-based perceptual similarity metrics.
NumberAuto-InstallMetric NameKey in ConfigKey in ReportCode SourceReferences
29Deep Perceptual Audio Metric (DPAM)dpamdpam_distancePerceptualAudio_Pytorchpaper
30Contrastive learning-based Deep Perceptual Audio Metric (CDPAM)cdpamcdpam_distancePerceptualAudiopaper
All metrics listed above require a reference audio signal for comparison. Choose the appropriate metric based on your evaluation scenario: voice conversion, speech enhancement, or general audio quality assessment.

Usage Guidelines

For speech enhancement tasks, use:
  • PESQ and STOI for perceptual quality and intelligibility
  • SI-SNR, SDR, SAR for signal-level metrics
  • VISQOL for high-quality perceptual assessment
For voice conversion tasks, use:
  • MCD to measure spectral distortion
  • F0 Correlation and F0 RMSE to evaluate pitch accuracy
For audio codec quality, use:
  • Discrete Speech BERT Score and BLEU Score
  • Speech Token Edit Distance
  • Log-Weighted Mean Square Error
For music and singing voice synthesis, use:
  • Chroma Alignment metrics for pitch and harmony
  • DPAM and CDPAM for perceptual similarity

Build docs developers (and LLMs) love