Introduction to VERSA

VERSA (Versatile Evaluation of Speech and Audio) is a comprehensive toolkit for evaluating speech and audio quality. It provides seamless access to over 90 evaluation metrics with 10x variants, enabling you to assess audio quality through multiple dimensions.

Why VERSA?

Evaluating audio quality requires multiple perspectives. A single metric cannot capture perceptual quality, intelligibility, technical accuracy, and statistical properties simultaneously. VERSA solves this by providing:

Comprehensive Coverage

Access 90+ metrics covering perceptual quality, intelligibility, technical measurements, and statistical properties in one unified toolkit.

Production Ready

Widely used in speech toolkits and challenges including ESPnet, with built-in support for distributed evaluation using Slurm.

Flexible Inputs

Support for various input formats including file paths, SCP files, and Kaldi-style ARKs for seamless integration.

Interactive Visualization

Built-in visualization tools to analyze and compare evaluation results across multiple metrics.

Key Features

Multiple Metric Categories

VERSA organizes metrics into four intuitive categories:

Independent Metrics

Standalone metrics that don’t require reference audio. Examples include DNSMOS, UTMOS, NISQA, and voice activity detection.

Dependent Metrics

Metrics that compare predicted audio against reference audio. Examples include PESQ, STOI, MCD, and signal-to-noise ratios.

Non-match Metrics

Metrics that work with non-matching references or information from other modalities, such as ASR-based metrics and speaker similarity.

Distributional Metrics

Metrics that evaluate statistical properties of audio collections, including FAD (Fréchet Audio Distance) and KID.

Real-World Applications

Speech Synthesis

Evaluate TTS and voice conversion systems with metrics for naturalness, similarity, and intelligibility.

Speech Enhancement

Assess denoising and enhancement quality with signal-based and perceptual metrics.

Audio Codecs

Measure codec quality with MCD, PESQ, STOI, and perceptual MOS predictors.

Singing Voice

Specialized metrics for singing voice synthesis and conversion, including SingMOS and chroma alignment.

Quick Example

Evaluate audio quality in just a few lines:

python versa/bin/scorer.py \
    --score_config egs/speech.yaml \
    --gt reference_audio/ \
    --pred generated_audio/ \
    --output_file results \
    --io dir

This command evaluates all metrics defined in speech.yaml and saves detailed results to results.txt.

New to audio evaluation? Start with the Quickstart to run your first evaluation in minutes.

What’s Inside

Core Metrics (Auto-Installed)

Perceptual Quality: UTMOS, DNSMOS, NISQA, Sheet-SSQA
Intelligibility: PESQ, STOI, TorchAudio-SQUIM
Signal Metrics: SDR, SAR, SIR, SI-SNR, CI-SDR
Spectral Distance: MCD (Mel Cepstral Distortion), F0 metrics
Speaker Similarity: Cosine similarity using ESPnet-SPK models
Discrete Speech: Speech BERT Score, Speech BLEU

Advanced Metrics (Optional Installation)

LLM-Based Profiling: Qwen2-Audio for 20+ speech characteristics
Perceptual Audio: DPAM, CDPAM distance metrics
ASR-Based: Word error rate (WER) with Whisper, ESPnet, OWSM
Distributional: FAD, KID for generative model evaluation
Music-Specific: Chroma alignment, singing technique assessment

See the full metrics documentation for a complete list with references.

Integration & Scalability

VERSA is designed for both local experimentation and large-scale evaluation:

# Single process evaluation
python versa/bin/scorer.py \
    --score_config config.yaml \
    --pred audio.scp \
    --gt reference.scp \
    --output_file results

Open Source & Widely Adopted

VERSA is used in production by leading speech research groups and has been integrated into major toolkits including ESPnet. Check the incomplete list of toolkits and challenges using VERSA.

Get Started

Installation

Install VERSA and configure metric-specific dependencies

Quickstart

Run your first evaluation in minutes with real examples

GitHub

Explore the source code and contribute to the project

Research & Citation

VERSA was presented at NAACL 2025 and has been featured in multiple publications. If you use VERSA in your research, please cite:

@inproceedings{shi2025versa,
title={{VERSA}: A Versatile Evaluation Toolkit for Speech, Audio, and Music},
author={Jiatong Shi and Hye-jin Shim and Jinchuan Tian and Siddhant Arora and others},
booktitle={2025 Annual Conference of the North American Chapter of the Association for Computational Linguistics},
year={2025},
url={https://openreview.net/forum?id=zU0hmbnyQm}
}

Want to learn more? Check out the presentation video from NAACL 2025 or try the interactive Colab demo.

Get Started

Core Concepts

Usage Guides

Metrics Reference

Advanced

Introduction to VERSA

Why VERSA?

Comprehensive Coverage

Production Ready

Flexible Inputs

Interactive Visualization

Key Features

Multiple Metric Categories

Real-World Applications

Speech Synthesis

Speech Enhancement

Audio Codecs

Singing Voice

Quick Example

What’s Inside

Core Metrics (Auto-Installed)

Advanced Metrics (Optional Installation)

Integration & Scalability

Open Source & Widely Adopted

Get Started

Installation

Quickstart

GitHub

Research & Citation

Build docs developers (and LLMs) love

Get Started

Core Concepts

Usage Guides

Metrics Reference

Advanced

​Why VERSA?

Comprehensive Coverage

Production Ready

Flexible Inputs

Interactive Visualization

​Key Features

​Multiple Metric Categories

​Real-World Applications

Speech Synthesis

Speech Enhancement

Audio Codecs

Singing Voice

​Quick Example

​What’s Inside

​Core Metrics (Auto-Installed)

​Advanced Metrics (Optional Installation)

​Integration & Scalability

​Open Source & Widely Adopted

​Get Started

Installation

Quickstart

GitHub

​Research & Citation

Build docs developers (and LLMs) love

Why VERSA?

Key Features

Multiple Metric Categories

Real-World Applications

Quick Example

What’s Inside

Core Metrics (Auto-Installed)

Advanced Metrics (Optional Installation)

Integration & Scalability

Open Source & Widely Adopted

Get Started

Research & Citation