Metric Categories
VERSA organizes metrics into four distinct categories based on their reference requirements:Independent Metrics
No-reference metrics that evaluate audio quality without requiring a reference signal. Perfect for real-time assessment and scenarios where reference audio is unavailable.
Dependent Metrics
Full-reference metrics that compare generated audio against a reference signal. Ideal for measuring distortion, similarity, and reconstruction quality.
Non-Match Metrics
Metrics that use non-matching references such as text transcripts or different audio samples. Useful for ASR evaluation and semantic similarity.
Distributional Metrics
Metrics that evaluate statistical properties across audio distributions. Essential for evaluating generative models and dataset quality.
Auto-Installation
Many metrics in VERSA are auto-installed for immediate use. Metrics marked with “x” in the Auto-Install column are available out-of-the-box without additional setup.Metrics without auto-installation require manual setup. Refer to each metric’s documentation and code source for installation instructions.
Understanding Metric Keys
Each metric has two important identifiers:- Key in config: Used when configuring metrics in your VERSA configuration file
- Key in report: The field name that appears in evaluation reports and results
Choosing the Right Metrics
Select metrics based on your evaluation scenario:Speech Enhancement & Denoising
Speech Enhancement & Denoising
- Independent: DNSMOS, NISQA, PLCMOS, VQScore
- Dependent: PESQ, STOI, SI-SNR, SDR, SAR
Voice Conversion & TTS
Voice Conversion & TTS
- Independent: UTMOS, Sheet SSQA, PAM
- Dependent: MCD, F0 Correlation, Speaker Similarity
- Non-Match: Speaker Embedding Similarity
Singing Voice Synthesis
Singing Voice Synthesis
- Independent: SingMOS, SingMOS Pro
- Dependent: Chroma Alignment
- Non-Match: Singer Similarity
Speech Recognition Quality
Speech Recognition Quality
- Non-Match: ESPnet WER, OWSM WER, Whisper WER, ASR-Mismatch
Audio Generation & GAN Evaluation
Audio Generation & GAN Evaluation
- Distributional: FAD, KL Divergence, Audio Density, Audio Coverage
Speaker & Prosody Analysis
Speaker & Prosody Analysis
- Independent: Qwen2 Speaker/Voice Properties, VAD, Speaking Rate
- Dependent: F0 RMSE
References and Sources
Each metric includes:- Links to research papers describing the methodology
- Code source repositories for implementation details
- Installation instructions where applicable
VERSA acknowledges all open-source implementations listed in the official metrics documentation.
Next Steps
Independent Metrics
Explore 55 no-reference metrics
Dependent Metrics
Explore 30 full-reference metrics
Non-Match Metrics
Explore 16 non-matching reference metrics
Distributional Metrics
Explore 5 distributional metrics