Skip to main content
Independent metrics evaluate audio quality without requiring a reference signal. These metrics are ideal for real-time assessment, live monitoring, and scenarios where clean reference audio is unavailable.
Metrics marked with Auto-Install are automatically installed with VERSA. Others require manual installation from their respective code sources.

MOS Prediction Metrics

Mean Opinion Score (MOS) prediction metrics estimate subjective quality ratings without reference audio.
NumberAuto-InstallMetric NameKey in ConfigKey in ReportCode SourceReferences
1Deep Noise Suppression MOS Score of P.835 (DNSMOS)pseudo_mosdnsmos_overallspeechmos (MS)paper
2Deep Noise Suppression MOS Score of P.808 (DNSMOS)pseudo_mosdnsmos_p808speechmos (MS)paper
3Non-intrusive Speech Quality and Naturalness Assessment (NISQA)nisqanisqa_mos_pred, nisqa_noi_pred, nisqa_dis_pred, nisqa_col_pred, nisqa_loud_predNISQApaper
4UTokyo-SaruLab System for VoiceMOS Challenge 2022 (UTMOS)pseudo_mosutmosspeechmospaper
5Packet Loss Concealment-related MOS Score (PLCMOS)pseudo_mosplcmosspeechmos (MS)paper
10Sheet SSQA MOS Modelssheet_ssqasheet_ssqaSheetpaper
11UTMOSv2: UTokyo-SaruLab MOS Prediction Systemutmosv2utmosv2UTMOSv2paper
49DNSMOS Pro: A Reduced-Size DNN for Probabilistic MOS of Speechpseudo_mosdnsmos_pro_bvccDNSMOSPropaper
50DNSMOS Pro: A Reduced-Size DNN for Probabilistic MOS of Speechpseudo_mosdnsmos_pro_nisqaDNSMOSPropaper
51DNSMOS Pro: A Reduced-Size DNN for Probabilistic MOS of Speechpseudo_mosdnsmos_pro_vcc2018DNSMOSPropaper
52WV-MOS (MOS score prediction by fine-tuned wav2vec2.0 model)wvmoswvmoswvmospaper
53SIG-MOSsigmosSIGMOS_COL, SIGMOS_DISC, SIGMOS_LOUD, SIGMOS_REVERB, SIGMOS_SIG, SIGMOS_OVRLsigmospaper

TorchAudio SQUIM Metrics

No-reference metrics from TorchAudio’s SQUIM (Speech Quality and Intelligibility Measures).
NumberAuto-InstallMetric NameKey in ConfigKey in ReportCode SourceReferences
6PESQ in TorchAudio-Squimsquim_no_reftorch_squim_pesqtorch_squimpaper
7STOI in TorchAudio-Squimsquim_no_reftorch_squim_stoitorch_squimpaper
8SI-SDR in TorchAudio-Squimsquim_no_reftorch_squim_si_sdrtorch_squimpaper

Singing Voice Metrics

NumberAuto-InstallMetric NameKey in ConfigKey in ReportCode SourceReferences
9Singing voice MOSpseudo_mossingmos_v1singmospaper
55Singing voice MOSpseudo_mossingmos_prosingmospaper

Speech Enhancement Metrics

NumberAuto-InstallMetric NameKey in ConfigKey in ReportCode SourceReferences
12Speech Contrastive Regression for Quality Assessment without reference (ScoreQ)scoreq_nrscoreq_nrScoreQpaper
13Speech enhancement-based SI-SNRse_snrse_si_snrESPnet
14Speech enhancement-based CI-SDRse_snrse_ci_sdrESPnet
15Speech enhancement-based SARse_snrse_sarESPnet
16Speech enhancement-based SDRse_snrse_sdrESPnet
54VQScore (Self-Supervised Speech Quality Estimation and Enhancement Using Only Clean Speech)vqscorevqscoreVQScorepaper

Audio-Language Models

NumberAuto-InstallMetric NameKey in ConfigKey in ReportCode SourceReferences
17PAM: Prompting Audio-Language Models for Audio Quality AssessmentpampamPAMPaper

Acoustic Quality Metrics

NumberAuto-InstallMetric NameKey in ConfigKey in ReportCode SourceReferences
18Speech-to-Reverberation Modulation energy Ratio (SRMR)srmrsrmrSRMRpyPaper
24Audiobox Aestheticsaudiobox_aestheticsaudiobox_aesthetics_CE, audiobox_aesthetics_CU, audiobox_aesthetics_PC, audiobox_aesthetics_PQAudiobox-AestheticsPaper

Voice Activity & Speech Rate

NumberAuto-InstallMetric NameKey in ConfigKey in ReportCode SourceReferences
19Voice Activity Detection (VAD)vadvad_infoSileroVAD
21Speaking Word/Character Rate (SWR)speaking_ratespeaking_rate--

Anti-Spoofing & Language ID

NumberAuto-InstallMetric NameKey in ConfigKey in ReportCode SourceReferences
22Anti-spoofing Score (SpoofS) with AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networksasvspoof_scoreasvspoof_scoreAASISTPaper
23Language IdentificationlidlanguageESPnetPaper

Qwen2 Audio Metrics

Comprehensive speech analysis using Qwen2 Audio model across multiple dimensions.

Speaker Characteristics

NumberAuto-InstallMetric NameKey in ConfigKey in ReportCode SourceReferences
25Qwen2 Speaker Characteristics - Countqwen2_speaker_count_metricqwen2_speaker_count_metricQwen2 Audiopaper
26Qwen2 Speaker Characteristics - Genderqwen2_speaker_gender_metricqwen2_speaker_gender_metricQwen2 Audiopaper
27Qwen2 Speaker Characteristics - Ageqwen2_speaker_age_metricqwen2_speaker_age_metricQwen2 Audiopaper
28Qwen2 Speaker Characteristics - Speech Impairmentqwen2_speech_impairment_metricqwen2_speech_impairment_metricQwen2 Audiopaper

Voice Properties

NumberAuto-InstallMetric NameKey in ConfigKey in ReportCode SourceReferences
29Qwen2 Voice Properties - Pitchqwen2_voice_pitch_metricqwen2_voice_pitch_metricQwen2 Audiopaper
30Qwen2 Voice Properties - Pitch Rangeqwen2_pitch_range_metricqwen2_pitch_range_metricQwen2 Audiopaper
31Qwen2 Voice Properties - Voice Typeqwen2_voice_type_metricqwen2_voice_type_metricQwen2 Audiopaper
32Qwen2 Voice Properties - Volume Levelqwen2_speech_volume_level_metricqwen2_speech_volume_level_metricQwen2 Audiopaper

Speech Content

NumberAuto-InstallMetric NameKey in ConfigKey in ReportCode SourceReferences
33Qwen2 Speech Content - Languageqwen2_language_metricqwen2_language_metricQwen2 Audiopaper
34Qwen2 Speech Content - Registerqwen2_speech_register_metricqwen2_speech_register_metricQwen2 Audiopaper
35Qwen2 Speech Content - Vocabulary Complexityqwen2_vocabulary_complexity_metricqwen2_vocabulary_complexity_metricQwen2 Audiopaper
36Qwen2 Speech Content - Purposeqwen2_speech_purpose_metricqwen2_speech_purpose_metricQwen2 Audiopaper

Speech Delivery

NumberAuto-InstallMetric NameKey in ConfigKey in ReportCode SourceReferences
37Qwen2 Speech Delivery - Emotionqwen2_speech_emotion_metricqwen2_speech_emotion_metricQwen2 Audiopaper
38Qwen2 Speech Delivery - Clarityqwen2_speech_clarity_metricqwen2_speech_clarity_metricQwen2 Audiopaper
39Qwen2 Speech Delivery - Rateqwen2_speech_rate_metricqwen2_speech_rate_metricQwen2 Audiopaper
40Qwen2 Speech Delivery - Styleqwen2_speaking_style_metricqwen2_speaking_style_metricQwen2 Audiopaper
41Qwen2 Speech Delivery - Emotional Vocalizationsqwen2_laughter_crying_metricqwen2_laughter_crying_metricQwen2 Audiopaper

Interaction Patterns

NumberAuto-InstallMetric NameKey in ConfigKey in ReportCode SourceReferences
42Qwen2 Interaction Patterns - Overlapping Speechqwen2_overlapping_speech_metricqwen2_overlapping_speech_metricQwen2 Audiopaper

Recording Environment

NumberAuto-InstallMetric NameKey in ConfigKey in ReportCode SourceReferences
43Qwen2 Recording Environment - Backgroundqwen2_speech_background_environment_metricqwen2_speech_background_environment_metricQwen2 Audiopaper
44Qwen2 Recording Environment - Qualityqwen2_recording_quality_metricqwen2_recording_quality_metricQwen2 Audiopaper
45Qwen2 Recording Environment - Channel Typeqwen2_channel_type_metricqwen2_channel_type_metricQwen2 Audiopaper

Emotion Recognition

NumberAuto-InstallMetric NameKey in ConfigKey in ReportCode SourceReferences
46Dimensional Emotionw2v2_dimensional_emotionw2v2_dimensional_emotionw2v2-how-topaper

Unified Assessment Models

NumberAuto-InstallMetric NameKey in ConfigKey in ReportCode SourceReferences
47Uni-VERSA (Versatile Speech Assessment with a Unified Framework) - No Referenceuniversa_norefuniversa_scoreUni-VERSApaper
48ARECHO (Audio Reference Echo Cancellation and Codec Quality Assessment) - No Referencearecho_norefarecho_scoreARECHOpaper
All metrics listed above are independent (no-reference) metrics. They evaluate audio quality without requiring a reference signal.

Build docs developers (and LLMs) love