Independent metrics evaluate audio quality without requiring a reference signal. These metrics are ideal for real-time assessment, live monitoring, and scenarios where clean reference audio is unavailable.
Metrics marked with Auto-Install are automatically installed with VERSA. Others require manual installation from their respective code sources.
MOS Prediction Metrics
Mean Opinion Score (MOS) prediction metrics estimate subjective quality ratings without reference audio.
| Number | Auto-Install | Metric Name | Key in Config | Key in Report | Code Source | References |
|---|
| 1 | ✓ | Deep Noise Suppression MOS Score of P.835 (DNSMOS) | pseudo_mos | dnsmos_overall | speechmos (MS) | paper |
| 2 | ✓ | Deep Noise Suppression MOS Score of P.808 (DNSMOS) | pseudo_mos | dnsmos_p808 | speechmos (MS) | paper |
| 3 | ✓ | Non-intrusive Speech Quality and Naturalness Assessment (NISQA) | nisqa | nisqa_mos_pred, nisqa_noi_pred, nisqa_dis_pred, nisqa_col_pred, nisqa_loud_pred | NISQA | paper |
| 4 | ✓ | UTokyo-SaruLab System for VoiceMOS Challenge 2022 (UTMOS) | pseudo_mos | utmos | speechmos | paper |
| 5 | ✓ | Packet Loss Concealment-related MOS Score (PLCMOS) | pseudo_mos | plcmos | speechmos (MS) | paper |
| 10 | ✓ | Sheet SSQA MOS Models | sheet_ssqa | sheet_ssqa | Sheet | paper |
| 11 | | UTMOSv2: UTokyo-SaruLab MOS Prediction System | utmosv2 | utmosv2 | UTMOSv2 | paper |
| 49 | ✓ | DNSMOS Pro: A Reduced-Size DNN for Probabilistic MOS of Speech | pseudo_mos | dnsmos_pro_bvcc | DNSMOSPro | paper |
| 50 | ✓ | DNSMOS Pro: A Reduced-Size DNN for Probabilistic MOS of Speech | pseudo_mos | dnsmos_pro_nisqa | DNSMOSPro | paper |
| 51 | ✓ | DNSMOS Pro: A Reduced-Size DNN for Probabilistic MOS of Speech | pseudo_mos | dnsmos_pro_vcc2018 | DNSMOSPro | paper |
| 52 | | WV-MOS (MOS score prediction by fine-tuned wav2vec2.0 model) | wvmos | wvmos | wvmos | paper |
| 53 | | SIG-MOS | sigmos | SIGMOS_COL, SIGMOS_DISC, SIGMOS_LOUD, SIGMOS_REVERB, SIGMOS_SIG, SIGMOS_OVRL | sigmos | paper |
TorchAudio SQUIM Metrics
No-reference metrics from TorchAudio’s SQUIM (Speech Quality and Intelligibility Measures).
| Number | Auto-Install | Metric Name | Key in Config | Key in Report | Code Source | References |
|---|
| 6 | ✓ | PESQ in TorchAudio-Squim | squim_no_ref | torch_squim_pesq | torch_squim | paper |
| 7 | ✓ | STOI in TorchAudio-Squim | squim_no_ref | torch_squim_stoi | torch_squim | paper |
| 8 | ✓ | SI-SDR in TorchAudio-Squim | squim_no_ref | torch_squim_si_sdr | torch_squim | paper |
Singing Voice Metrics
| Number | Auto-Install | Metric Name | Key in Config | Key in Report | Code Source | References |
|---|
| 9 | ✓ | Singing voice MOS | pseudo_mos | singmos_v1 | singmos | paper |
| 55 | ✓ | Singing voice MOS | pseudo_mos | singmos_pro | singmos | paper |
Speech Enhancement Metrics
| Number | Auto-Install | Metric Name | Key in Config | Key in Report | Code Source | References |
|---|
| 12 | | Speech Contrastive Regression for Quality Assessment without reference (ScoreQ) | scoreq_nr | scoreq_nr | ScoreQ | paper |
| 13 | ✓ | Speech enhancement-based SI-SNR | se_snr | se_si_snr | ESPnet | |
| 14 | ✓ | Speech enhancement-based CI-SDR | se_snr | se_ci_sdr | ESPnet | |
| 15 | ✓ | Speech enhancement-based SAR | se_snr | se_sar | ESPnet | |
| 16 | ✓ | Speech enhancement-based SDR | se_snr | se_sdr | ESPnet | |
| 54 | ✓ | VQScore (Self-Supervised Speech Quality Estimation and Enhancement Using Only Clean Speech) | vqscore | vqscore | VQScore | paper |
Audio-Language Models
| Number | Auto-Install | Metric Name | Key in Config | Key in Report | Code Source | References |
|---|
| 17 | ✓ | PAM: Prompting Audio-Language Models for Audio Quality Assessment | pam | pam | PAM | Paper |
Acoustic Quality Metrics
| Number | Auto-Install | Metric Name | Key in Config | Key in Report | Code Source | References |
|---|
| 18 | | Speech-to-Reverberation Modulation energy Ratio (SRMR) | srmr | srmr | SRMRpy | Paper |
| 24 | | Audiobox Aesthetics | audiobox_aesthetics | audiobox_aesthetics_CE, audiobox_aesthetics_CU, audiobox_aesthetics_PC, audiobox_aesthetics_PQ | Audiobox-Aesthetics | Paper |
Voice Activity & Speech Rate
| Number | Auto-Install | Metric Name | Key in Config | Key in Report | Code Source | References |
|---|
| 19 | ✓ | Voice Activity Detection (VAD) | vad | vad_info | SileroVAD | |
| 21 | ✓ | Speaking Word/Character Rate (SWR) | speaking_rate | speaking_rate | - | - |
Anti-Spoofing & Language ID
| Number | Auto-Install | Metric Name | Key in Config | Key in Report | Code Source | References |
|---|
| 22 | ✓ | Anti-spoofing Score (SpoofS) with AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks | asvspoof_score | asvspoof_score | AASIST | Paper |
| 23 | ✓ | Language Identification | lid | language | ESPnet | Paper |
Qwen2 Audio Metrics
Comprehensive speech analysis using Qwen2 Audio model across multiple dimensions.
Speaker Characteristics
| Number | Auto-Install | Metric Name | Key in Config | Key in Report | Code Source | References |
|---|
| 25 | ✓ | Qwen2 Speaker Characteristics - Count | qwen2_speaker_count_metric | qwen2_speaker_count_metric | Qwen2 Audio | paper |
| 26 | ✓ | Qwen2 Speaker Characteristics - Gender | qwen2_speaker_gender_metric | qwen2_speaker_gender_metric | Qwen2 Audio | paper |
| 27 | ✓ | Qwen2 Speaker Characteristics - Age | qwen2_speaker_age_metric | qwen2_speaker_age_metric | Qwen2 Audio | paper |
| 28 | ✓ | Qwen2 Speaker Characteristics - Speech Impairment | qwen2_speech_impairment_metric | qwen2_speech_impairment_metric | Qwen2 Audio | paper |
Voice Properties
| Number | Auto-Install | Metric Name | Key in Config | Key in Report | Code Source | References |
|---|
| 29 | ✓ | Qwen2 Voice Properties - Pitch | qwen2_voice_pitch_metric | qwen2_voice_pitch_metric | Qwen2 Audio | paper |
| 30 | ✓ | Qwen2 Voice Properties - Pitch Range | qwen2_pitch_range_metric | qwen2_pitch_range_metric | Qwen2 Audio | paper |
| 31 | ✓ | Qwen2 Voice Properties - Voice Type | qwen2_voice_type_metric | qwen2_voice_type_metric | Qwen2 Audio | paper |
| 32 | ✓ | Qwen2 Voice Properties - Volume Level | qwen2_speech_volume_level_metric | qwen2_speech_volume_level_metric | Qwen2 Audio | paper |
Speech Content
| Number | Auto-Install | Metric Name | Key in Config | Key in Report | Code Source | References |
|---|
| 33 | ✓ | Qwen2 Speech Content - Language | qwen2_language_metric | qwen2_language_metric | Qwen2 Audio | paper |
| 34 | ✓ | Qwen2 Speech Content - Register | qwen2_speech_register_metric | qwen2_speech_register_metric | Qwen2 Audio | paper |
| 35 | ✓ | Qwen2 Speech Content - Vocabulary Complexity | qwen2_vocabulary_complexity_metric | qwen2_vocabulary_complexity_metric | Qwen2 Audio | paper |
| 36 | ✓ | Qwen2 Speech Content - Purpose | qwen2_speech_purpose_metric | qwen2_speech_purpose_metric | Qwen2 Audio | paper |
Speech Delivery
| Number | Auto-Install | Metric Name | Key in Config | Key in Report | Code Source | References |
|---|
| 37 | ✓ | Qwen2 Speech Delivery - Emotion | qwen2_speech_emotion_metric | qwen2_speech_emotion_metric | Qwen2 Audio | paper |
| 38 | ✓ | Qwen2 Speech Delivery - Clarity | qwen2_speech_clarity_metric | qwen2_speech_clarity_metric | Qwen2 Audio | paper |
| 39 | ✓ | Qwen2 Speech Delivery - Rate | qwen2_speech_rate_metric | qwen2_speech_rate_metric | Qwen2 Audio | paper |
| 40 | ✓ | Qwen2 Speech Delivery - Style | qwen2_speaking_style_metric | qwen2_speaking_style_metric | Qwen2 Audio | paper |
| 41 | ✓ | Qwen2 Speech Delivery - Emotional Vocalizations | qwen2_laughter_crying_metric | qwen2_laughter_crying_metric | Qwen2 Audio | paper |
Interaction Patterns
| Number | Auto-Install | Metric Name | Key in Config | Key in Report | Code Source | References |
|---|
| 42 | ✓ | Qwen2 Interaction Patterns - Overlapping Speech | qwen2_overlapping_speech_metric | qwen2_overlapping_speech_metric | Qwen2 Audio | paper |
Recording Environment
| Number | Auto-Install | Metric Name | Key in Config | Key in Report | Code Source | References |
|---|
| 43 | ✓ | Qwen2 Recording Environment - Background | qwen2_speech_background_environment_metric | qwen2_speech_background_environment_metric | Qwen2 Audio | paper |
| 44 | ✓ | Qwen2 Recording Environment - Quality | qwen2_recording_quality_metric | qwen2_recording_quality_metric | Qwen2 Audio | paper |
| 45 | ✓ | Qwen2 Recording Environment - Channel Type | qwen2_channel_type_metric | qwen2_channel_type_metric | Qwen2 Audio | paper |
Emotion Recognition
| Number | Auto-Install | Metric Name | Key in Config | Key in Report | Code Source | References |
|---|
| 46 | ✓ | Dimensional Emotion | w2v2_dimensional_emotion | w2v2_dimensional_emotion | w2v2-how-to | paper |
Unified Assessment Models
| Number | Auto-Install | Metric Name | Key in Config | Key in Report | Code Source | References |
|---|
| 47 | | Uni-VERSA (Versatile Speech Assessment with a Unified Framework) - No Reference | universa_noref | universa_score | Uni-VERSA | paper |
| 48 | | ARECHO (Audio Reference Echo Cancellation and Codec Quality Assessment) - No Reference | arecho_noref | arecho_score | ARECHO | paper |
All metrics listed above are independent (no-reference) metrics. They evaluate audio quality without requiring a reference signal.