Skip to main content
VERSA provides a comprehensive suite of speech quality assessment metrics across four main categories. This reference guide helps you understand and choose the right metrics for your evaluation needs.

Metric Categories

VERSA organizes metrics into four distinct categories based on their reference requirements:

Independent Metrics

No-reference metrics that evaluate audio quality without requiring a reference signal. Perfect for real-time assessment and scenarios where reference audio is unavailable.

Dependent Metrics

Full-reference metrics that compare generated audio against a reference signal. Ideal for measuring distortion, similarity, and reconstruction quality.

Non-Match Metrics

Metrics that use non-matching references such as text transcripts or different audio samples. Useful for ASR evaluation and semantic similarity.

Distributional Metrics

Metrics that evaluate statistical properties across audio distributions. Essential for evaluating generative models and dataset quality.

Auto-Installation

Many metrics in VERSA are auto-installed for immediate use. Metrics marked with “x” in the Auto-Install column are available out-of-the-box without additional setup.
Metrics without auto-installation require manual setup. Refer to each metric’s documentation and code source for installation instructions.

Understanding Metric Keys

Each metric has two important identifiers:
  • Key in config: Used when configuring metrics in your VERSA configuration file
  • Key in report: The field name that appears in evaluation reports and results

Choosing the Right Metrics

Select metrics based on your evaluation scenario:
  • Independent: DNSMOS, NISQA, PLCMOS, VQScore
  • Dependent: PESQ, STOI, SI-SNR, SDR, SAR
  • Independent: UTMOS, Sheet SSQA, PAM
  • Dependent: MCD, F0 Correlation, Speaker Similarity
  • Non-Match: Speaker Embedding Similarity
  • Independent: SingMOS, SingMOS Pro
  • Dependent: Chroma Alignment
  • Non-Match: Singer Similarity
  • Non-Match: ESPnet WER, OWSM WER, Whisper WER, ASR-Mismatch
  • Distributional: FAD, KL Divergence, Audio Density, Audio Coverage
  • Independent: Qwen2 Speaker/Voice Properties, VAD, Speaking Rate
  • Dependent: F0 RMSE

References and Sources

Each metric includes:
  • Links to research papers describing the methodology
  • Code source repositories for implementation details
  • Installation instructions where applicable
VERSA acknowledges all open-source implementations listed in the official metrics documentation.

Next Steps

Independent Metrics

Explore 55 no-reference metrics

Dependent Metrics

Explore 30 full-reference metrics

Non-Match Metrics

Explore 16 non-matching reference metrics

Distributional Metrics

Explore 5 distributional metrics

Build docs developers (and LLMs) love