Skip to main content
VERSA provides a powerful command-line interface through scorer.py for evaluating speech and audio quality. This guide covers all CLI options and usage patterns.
After installation, you can use either versa-score (installed command) or python versa/bin/scorer.py (direct script). This guide uses the direct script syntax for clarity.

Basic Usage

The basic syntax for running VERSA from the command line:
python versa/bin/scorer.py \
    --score_config egs/speech.yaml \
    --pred test/test_samples/test2.scp \
    --gt test/test_samples/test1.scp \
    --output_file results/test_result
Or using the installed command:
versa-score \
    --score_config egs/speech.yaml \
    --pred test/test_samples/test2.scp \
    --gt test/test_samples/test1.scp \
    --output_file results/test_result

CLI Arguments Reference

Required Arguments

--pred
string
required
Path to the generated/predicted waveforms. Supports:
  • SCP files (Kaldi-style): pred.scp
  • Direct audio files: audio.wav
  • Directory paths when using --io dir
--score_config
string
required
Path to the YAML configuration file specifying which metrics to compute.Examples:
  • egs/speech.yaml - Speech evaluation metrics
  • egs/singing.yaml - Singing voice metrics
  • egs/general.yaml - General audio metrics

Optional Arguments

--gt
string
Path to ground truth/reference waveforms. Use None for reference-free evaluation.Default: None
--text
string
Path to ground truth transcriptions for ASR-based metrics (WER/CER).Format:
utt_001 This is the first transcription
utt_002 This is the second transcription
--output_file
string
Path for writing evaluation results. Results are saved in JSONL format.Default: None (prints to stdout only)
--cache_folder
string
Directory for caching intermediate results and model outputs.Default: None
--use_gpu
boolean
Whether to use GPU acceleration for neural network-based metrics.Default: False
--io
string
I/O interface for loading audio files.Choices:
  • kaldi - Kaldi-style SCP/ARK files (compatible with ESPnet)
  • soundfile - Direct audio file reading with soundfile
  • dir - Directory-based audio loading
Default: kaldi
--verbose
integer
Verbosity level for logging output.Levels:
  • 0 - Warnings only
  • 1 - Info messages (default)
  • 2+ - Debug messages
Default: 1
--rank
integer
Overall rank in batch processing, used to specify GPU device ID.Useful for distributed processing to assign specific GPUs.Default: 0
--no_match
boolean
default:"false"
Flag to disable matching between ground truth and generated files.Use when files are pre-aligned or for independent evaluation.

Usage Examples

Basic Evaluation with Reference

Evaluate predicted speech against ground truth:
python versa/bin/scorer.py \
    --score_config egs/speech.yaml \
    --pred data/predicted.scp \
    --gt data/reference.scp \
    --output_file results/evaluation \
    --io soundfile

Reference-Free Evaluation

Evaluate without ground truth (using only independent metrics):
python versa/bin/scorer.py \
    --score_config egs/speech.yaml \
    --pred data/predicted.scp \
    --gt None \
    --output_file results/noref_evaluation \
    --io soundfile

Evaluation with Transcriptions

Include ASR-based metrics (WER/CER) using text transcriptions:
python versa/bin/scorer.py \
    --score_config egs/separate_metrics/wer.yaml \
    --pred data/predicted.scp \
    --gt data/reference.scp \
    --text data/transcripts.txt \
    --output_file results/with_wer \
    --io soundfile

GPU-Accelerated Evaluation

Use GPU for faster processing of neural metrics:
python versa/bin/scorer.py \
    --score_config egs/speech_gpu.yaml \
    --pred data/predicted.scp \
    --gt data/reference.scp \
    --output_file results/gpu_evaluation \
    --use_gpu true \
    --io soundfile

Directory-Based Evaluation

Evaluate all audio files in a directory:
python versa/bin/scorer.py \
    --score_config egs/speech.yaml \
    --pred test/test_samples/test2 \
    --gt test/test_samples/test1 \
    --output_file results/dir_evaluation \
    --io dir

Kaldi/ESPnet Compatible Evaluation

Use Kaldi-style ARK files (compatible with ESPnet workflows):
python versa/bin/scorer.py \
    --score_config egs/speech.yaml \
    --pred exp/model/wav.scp \
    --gt data/test/wav.scp \
    --output_file results/kaldi_evaluation \
    --io kaldi

Input File Formats

SCP Format (Kaldi-style)

SCP files list utterance IDs and paths:
utt_001 /path/to/audio1.wav
utt_002 /path/to/audio2.wav
utt_003 /path/to/audio3.wav

Text Transcription Format

Text files map utterance IDs to transcriptions:
utt_001 The quick brown fox jumps over the lazy dog
utt_002 VERSA is a versatile evaluation toolkit
utt_003 Speech and audio quality assessment

Output Format

Results are saved in JSONL format (one JSON object per line):
{"key": "utt_001", "pesq": 3.456, "stoi": 0.923, "utmos": 4.12, "mcd": 5.34}
{"key": "utt_002", "pesq": 3.789, "stoi": 0.945, "utmos": 4.35, "mcd": 4.87}
Use python scripts/show_result.py results.txt to analyze and visualize the JSONL output.

Verbosity Levels

1

Quiet Mode (--verbose 0)

Shows only warnings and errors:
python versa/bin/scorer.py --score_config egs/speech.yaml \
    --pred data/pred.scp --gt data/gt.scp \
    --output_file results.txt --verbose 0
2

Normal Mode (--verbose 1)

Shows info messages and progress (default):
python versa/bin/scorer.py --score_config egs/speech.yaml \
    --pred data/pred.scp --gt data/gt.scp \
    --output_file results.txt --verbose 1
3

Debug Mode (--verbose 2)

Shows detailed debugging information:
python versa/bin/scorer.py --score_config egs/speech.yaml \
    --pred data/pred.scp --gt data/gt.scp \
    --output_file results.txt --verbose 2

Common Workflows

Evaluate TTS/speech synthesis outputs:
python versa/bin/scorer.py \
    --score_config egs/speech.yaml \
    --pred exp/tts_model/generated.scp \
    --gt data/test/reference.scp \
    --text data/test/text \
    --output_file results/tts_evaluation \
    --use_gpu true \
    --io soundfile

Best Practices

Use Appropriate Config

Select the configuration file that matches your use case:
  • speech.yaml for general speech
  • speech_gpu.yaml for GPU-accelerated speech metrics
  • singing.yaml for singing voice

Enable GPU

Use --use_gpu true when evaluating neural metrics like UTMOS, DNS-MOS, or speaker similarity for significantly faster processing.

Set Verbosity

Use --verbose 1 for progress tracking during long evaluations, or --verbose 0 when running in batch scripts.

Cache Results

Use --cache_folder to store intermediate results when re-running evaluations with different metric configurations.
When using --use_gpu true with distributed processing, always set the --rank parameter to assign specific GPU devices and avoid conflicts.

Next Steps

Build docs developers (and LLMs) love