Skip to main content

Installation Issues

Metrics Requiring Manual Installation

Some metrics are not included in the default VERSA installation and require additional setup. Check the Auto-Install column in the Supported Metrics documentation.
Error: utmosv2 is not installedSolution:
./tools/install_utmosv2.sh
Additional requirements:
  • Git LFS must be installed and configured
  • If you get _pickle.UnpicklingError: invalid load key, 'v'., install Git LFS:
# Ubuntu/Debian
sudo apt-get install git-lfs
git lfs install

# macOS
brew install git-lfs
git lfs install

# Then re-clone or re-pull the model
rm -rf ~/.cache/torch/hub/checkpoints/utmosv2*
./tools/install_utmosv2.sh
Reference: UTMOSv2 GitHub
No Auto-Install: Requires manual installation for both reference and no-reference versionsSolution:
git clone https://github.com/ftshijt/scoreq.git
cd scoreq
pip install -e .
Config keys:
  • scoreq_nr - No-reference version
  • scoreq_ref - With-reference version
Reference: ScoreQ Paper
No Auto-InstallSolution:
git clone https://github.com/shimhz/SRMRpy.git
cd SRMRpy
pip install -e .
Reference: SRMR Paper
No Auto-InstallSolution:
git clone https://github.com/facebookresearch/audiobox-aesthetics.git
cd audiobox-aesthetics
pip install -e .
Returns multiple scores:
  • audiobox_aesthetics_CE - Clarity/Engagement
  • audiobox_aesthetics_CU - Consonant Understanding
  • audiobox_aesthetics_PC - Pitch Contour
  • audiobox_aesthetics_PQ - Prosody Quality
Reference: Audiobox Paper
No Auto-Install: Advanced multi-modal assessment modelsSolution:
# Install from HuggingFace
pip install transformers torch
Available variants:
  • universa_noref - No reference
  • universa_audioref - With audio reference
  • universa_textref - With text reference
  • universa_fullref - With both audio and text reference
  • arecho_noref - Echo cancellation and codec quality (no reference)
  • arecho_audioref - With audio reference
  • arecho_textref - With text reference
  • arecho_fullref - With full reference
Reference: Uni-VERSA Collection
No Auto-InstallSolution:
pip install git+https://github.com/AndreevP/wvmos.git
Reference: WV-MOS Paper
No Auto-InstallSolution:
git clone https://github.com/microsoft/SIG-Challenge.git
cd SIG-Challenge/ICASSP2024/sigmos
pip install -e .
Returns multiple scores:
  • SIGMOS_COL - Coloration
  • SIGMOS_DISC - Discontinuity
  • SIGMOS_LOUD - Loudness
  • SIGMOS_REVERB - Reverberation
  • SIGMOS_SIG - Signal quality
  • SIGMOS_OVRL - Overall quality
Reference: SIG-MOS Paper
No Auto-InstallSolution:
pip install git+https://github.com/ftshijt/emotion2vec.git
Reference: Emotion2vec Paper
No Auto-InstallSolution:
git clone https://github.com/shimhz/nomad.git
cd nomad
pip install -e .
Reference: NOMAD Paper
No Auto-InstallSolution:
pip install git+https://github.com/gudgud96/frechet-audio-distance.git
Reference: CLAP Paper
No Auto-InstallSolution:
git clone https://github.com/SonyCSLParis/audio-metrics.git
cd audio-metrics
pip install -e .
Reference: APA Paper
No Auto-InstallSolution:
# Requires Google's VISQOL implementation
git clone https://github.com/google/visqol.git
cd visqol
# Follow build instructions in repository
Reference: VISQOL Paper
No Auto-InstallSolution:
git clone https://github.com/wjassim/WARP-Q.git
cd WARP-Q
pip install -e .
Reference: WARP-Q Paper
No Auto-InstallSolution:
git clone https://github.com/shimhz/Noresqa.git
cd Noresqa
pip install -e .
Reference: NORESQA Paper
No Auto-Install: Require full corpus for computationSolution:
# FAD and related metrics
pip install git+https://github.com/microsoft/fadtk.git

# Audio density and coverage
git clone https://github.com/SonyCSLParis/audio-metrics.git
cd audio-metrics
pip install -e .

# KL divergence
pip install git+https://github.com/Stability-AI/stable-audio-metrics.git
Available metrics:
  • fad - Frechet Audio Distance
  • kl_embedding - KL Divergence on embeddings
  • audio_density - Audio Density Score
  • audio_coverage - Audio Coverage Score
No Auto-InstallSolution:
git clone https://github.com/shimhz/pysepm.git
cd pysepm
pip install -e .
Available metrics:
  • pysepm_fwsegsnr - Frequency-Weighted Segmental SNR
  • pysepm_wss - Weighted Spectral Slope
  • pysepm_cd - Cepstrum Distance
  • pysepm_Csig, pysepm_Cbak, pysepm_Covl - Composite measures
  • pysepm_csii_high, pysepm_csii_mid, pysepm_csii_low - CSII
  • pysepm_ncm - Normalized-covariance measure
  • pysepm_llr - Log Likelihood Ratio
Reference: PySepm Paper

Dependency Conflicts

Some metrics require specific package versions that may conflict with other dependencies.
Create separate environments for conflicting metrics:
# Main VERSA environment
conda create -n versa python=3.9
conda activate versa
pip install versa-speech-eval

# Separate environment for specific metrics
conda create -n versa-utmosv2 python=3.9
conda activate versa-utmosv2
pip install versa-speech-eval
./tools/install_utmosv2.sh

ONNX Runtime Issues

If metrics using ONNX Runtime (DNSMOS, PLCMOS) fail:
pip uninstall onnxruntime onnxruntime-gpu
pip install onnxruntime==1.15.0  # Try specific version

Runtime Errors

GPU Out of Memory

RuntimeError: CUDA out of memory
Solutions:
1

Reduce batch size

Process fewer files at once or use single-file processing mode.
2

Use CPU fallback

Set use_gpu: false in your configuration:
score:
  - name: pseudo_mos
    predictor: utmos
    use_gpu: false
3

Clear GPU cache

import torch
torch.cuda.empty_cache()
4

Use mixed metrics

Run GPU-intensive metrics separately from CPU metrics to avoid loading all models simultaneously.

Sample Rate Mismatches

ValueError: Sample rate mismatch
Most metrics automatically resample, but some require specific sample rates:
  • 16 kHz: UTMOS, speaker similarity, Whisper WER, OWSM WER
  • 48 kHz: Some professional audio metrics
  • Flexible: PESQ, STOI (but performance varies)
Solution: Pre-resample your audio files:
import librosa
import soundfile as sf

audio, sr = librosa.load('input.wav', sr=None)
audio_16k = librosa.resample(audio, orig_sr=sr, target_sr=16000)
sf.write('output_16k.wav', audio_16k, 16000)

Missing Model Checkpoints

FileNotFoundError: Model checkpoint not found
Solution: Clear cache and re-download:
rm -rf ~/.cache/torch/hub
rm -rf versa_cache/

# Re-run your evaluation to trigger download
python -m versa.bin.score --config config.yaml ...

Import Errors

ModuleNotFoundError: No module named 'X'
Solution: Verify installation:
pip list | grep versa
pip install --upgrade versa-speech-eval
For development installation:
cd versa
pip install -e .

Data Format Issues

wav.scp Format

VERSA expects Kaldi-style wav.scp format:
utterance_001 /path/to/audio1.wav
utterance_002 /path/to/audio2.wav
utterance_003 /path/to/audio3.flac
  • Each line: <utterance_id> <absolute_or_relative_path>
  • Supported formats: WAV, FLAC, MP3, OGG
  • Paths should not end with pipe | unless using Kaldi I/O
Common mistakes:
# Wrong: No utterance ID
/path/to/audio1.wav

# Wrong: Multiple spaces or tabs (use single space)
utterance_001    /path/to/audio1.wav

# Correct
utterance_001 /path/to/audio1.wav

Text File Format

For text-dependent metrics (WER, text-based assessment):
utterance_001 This is the transcription for audio one.
utterance_002 Another transcription here.
utterance_003 Make sure utterance IDs match wav.scp exactly.

Mismatched Utterance IDs

KeyError: 'utterance_001'
Ensure utterance IDs match across all files:
# Check IDs in prediction file
cut -d' ' -f1 pred.scp | sort > pred_ids.txt

# Check IDs in ground truth file
cut -d' ' -f1 gt.scp | sort > gt_ids.txt

# Check IDs in text file
cut -d' ' -f1 text.txt | sort > text_ids.txt

# Compare
diff pred_ids.txt gt_ids.txt

Performance Issues

Slow Processing

1

Enable GPU acceleration

score:
  - name: pseudo_mos
    use_gpu: true
2

Use parallel processing

See Slurm Integration for cluster-based parallelization.
3

Reduce metric count

Run only essential metrics in your initial evaluation:
score:
  # Start with fast, essential metrics
  - name: pseudo_mos
    predictor: utmos
  - name: pesq
  # Add more later as needed
4

Cache models

Set cache directory to persistent location:
export TORCH_HOME=/path/to/persistent/cache

Configuration Issues

Invalid YAML Syntax

YAMLError: mapping values are not allowed here
Common issues:
# Wrong: Missing space after colon
score:
  - name:pseudo_mos

# Correct
score:
  - name: pseudo_mos
# Wrong: Inconsistent indentation
score:
  - name: pseudo_mos
   predictor: utmos  # Should align with 'name'

# Correct
score:
  - name: pseudo_mos
    predictor: utmos

Metric Not Found

KeyError: 'my_metric' not found in available metrics
Verify metric name in versa/scorer_shared.py and that it’s properly registered. Available metrics are listed in Supported Metrics.

Getting Help

GitHub Issues

Report bugs and request features

Documentation

Review guides and API reference

Example Configs

Browse working configuration examples

Contributing Guide

Learn how to contribute fixes
When reporting issues, include:
  • VERSA version: pip show versa-speech-eval
  • Python version: python --version
  • Operating system and GPU type (if using GPU)
  • Complete error traceback
  • Minimal configuration file that reproduces the issue

Build docs developers (and LLMs) love