Skip to main content
VERSA supports three different input formats for specifying audio files. Choose the format that best fits your workflow and data organization.

Overview

The input format is controlled by the --io flag:

kaldi

Kaldi-style SCP files (default)

soundfile

Simple file path lists

dir

Directory scanning
versa-scorer \
  --pred outputs.scp \
  --io kaldi  # or soundfile, or dir

Kaldi Format (Default)

The Kaldi format uses .scp (script) files that map utterance IDs to audio file paths or processing pipelines.

Basic File Paths

Simple ID-to-path mapping:
utterance_001 /path/to/audio/file1.wav
utterance_002 /path/to/audio/file2.wav
utterance_003 /path/to/audio/file3.flac
This is the default format. If you don’t specify --io, VERSA assumes kaldi.

Pipeline Format

Kaldi supports processing pipelines with pipes (|):
utterance_001 sox /path/to/file1.wav -t wav - |
utterance_002 ffmpeg -i /path/to/file2.mp3 -f wav - |
utterance_003 cat /path/to/file3.raw | sox -t raw -r 16000 -b 16 -e signed - -t wav - |
Pipeline support requires the kaldi IO interface. The soundfile interface does not support pipes.

Archive Format

Point to Kaldi archives:
utterance_001 ark:/path/to/archive.ark:12345
utterance_002 ark:/path/to/archive.ark:67890

When to Use Kaldi Format

If you’re already using Kaldi for speech processing, use this format to maintain compatibility:
# Use existing Kaldi wav.scp files
versa-scorer \
  --pred exp/generated/wav.scp \
  --gt data/test/wav.scp \
  --score_config configs/speech.yaml \
  --output_file results.json
Apply audio processing during evaluation:
# Convert sample rate on the fly
utt1 sox input.wav -r 16000 -t wav - |

# Extract channel
utt2 sox stereo.wav -c 1 -t wav - |

# Apply volume normalization
utt3 sox input.wav -t wav - norm |
Access files stored in Kaldi archives efficiently:
utt_001 ark:data/train.ark:0
utt_002 ark:data/train.ark:44100
utt_003 ark:data/train.ark:88200

Creating Kaldi SCP Files

Generate SCP from a directory of audio files:
# Simple script to create wav.scp
for file in /path/to/audio/*.wav; do
  id=$(basename "$file" .wav)
  echo "${id} ${file}"
done > wav.scp
Result:
sample1 /path/to/audio/sample1.wav
sample2 /path/to/audio/sample2.wav
sample3 /path/to/audio/sample3.wav

Soundfile Format

A simplified format using plain text file lists without pipeline support.

Format Specification

utterance_001 /path/to/audio/file1.wav
utterance_002 /path/to/audio/file2.wav
utterance_003 /path/to/audio/file3.flac
The soundfile format does not support pipes (|). If your file path ends with |, you’ll get an error:
ValueError: Not supported wav.scp format. Set IO interface to kaldi

Usage

versa-scorer \
  --pred outputs.txt \
  --gt references.txt \
  --io soundfile \
  --score_config configs/speech.yaml \
  --output_file results.json

When to Use Soundfile Format

Simple File Lists

When you have straightforward file paths without processing

No Kaldi Dependency

When you want to avoid Kaldi-specific features

Quick Evaluation

For rapid testing without pipeline overhead

Compatibility

When integrating with non-Kaldi tools

Creating Soundfile Lists

# Simple file list
ls /path/to/audio/*.wav | awk '{print NR, $0}' > files.txt

# With custom IDs
for file in /path/to/audio/*.wav; do
  id=$(basename "$file" .wav)
  echo "${id} ${file}"
done > files.txt
The format is identical to basic Kaldi SCP, but processed differently internally. Use soundfile if you don’t need pipeline features.

Directory Format

Automatically discover and process all audio files in a directory.

Usage

versa-scorer \
  --pred /path/to/generated_audio/ \
  --gt /path/to/reference_audio/ \
  --io dir \
  --score_config configs/speech.yaml \
  --output_file results.json
When using --io dir, the --pred and --gt arguments should be directory paths, not file paths.

How It Works

VERSA will:
  1. Scan the directory recursively
  2. Find all audio files (wav, flac, mp3, etc.)
  3. Generate utterance IDs from filenames
  4. Create an in-memory file mapping

File Discovery

VERSA searches for common audio extensions:
  • .wav - Waveform Audio
  • .flac - Free Lossless Audio Codec
  • .mp3 - MPEG Audio Layer 3
  • .ogg - Ogg Vorbis
  • .opus - Opus Interactive Audio Codec
  • .m4a - MPEG-4 Audio
  • And more via soundfile/librosa

Utterance ID Generation

IDs are derived from file paths:
# File: /path/to/audio/speaker1/utt001.wav
# ID: speaker1_utt001

# File: /path/to/audio/test.flac  
# ID: test
Ensure filenames are unique across all subdirectories, or IDs may collide.

When to Use Directory Format

Evaluate all files in a directory without creating SCP files:
# Evaluate entire output directory
versa-scorer \
  --pred experiments/run_001/outputs/ \
  --gt data/references/ \
  --io dir \
  --score_config configs/quick.yaml \
  --output_file results.json
Quickly analyze a collection of audio files:
# Analyze a dataset
versa-scorer \
  --pred dataset/generated/ \
  --io dir \
  --score_config configs/independent_metrics.yaml \
  --output_file analysis.json
During development when file lists aren’t established:
# Test new model outputs
versa-scorer \
  --pred model_v2/samples/ \
  --io dir \
  --score_config configs/tts.yaml \
  --output_file model_v2_results.json

Matching Reference Files

How VERSA Matches Files

For dependent metrics, VERSA matches predicted and reference files by utterance ID:
# pred.scp
utterance_001 /pred/file1.wav
utterance_002 /pred/file2.wav

# gt.scp  
utterance_001 /gt/file1.wav
utterance_002 /gt/file2.wav
utterance_003 /gt/file3.wav  # This won't cause an error
VERSA only evaluates IDs present in the prediction file. Extra IDs in ground truth are ignored.

Disabling Matching

For non-match metrics only:
versa-scorer \
  --pred outputs.scp \
  --gt references.scp \
  --no_match \
  --score_config configs/nonmatch_only.yaml \
  --output_file results.json
Using --no_match with dependent metrics will cause those metrics to be skipped.

Comparison Table

Featurekaldisoundfiledir
Default format
Requires SCP file
Pipeline support
Archive support
Auto-discovery
Processing overheadLowLowMedium
Setup complexityMediumLowMinimal
Use caseProductionSimple listsQuick tests

Best Practices

Use kaldi format for production:✅ Consistent with speech processing tools
✅ Supports preprocessing pipelines
✅ Efficient with large datasets
✅ Well-documented format
versa-scorer \
  --pred exp/system_a/wav.scp \
  --gt data/test_clean/wav.scp \
  --score_config configs/production.yaml \
  --output_file results/system_a.json
Use dir format for development:✅ No file preparation needed
✅ Quick iteration
✅ Easy to add/remove samples
versa-scorer \
  --pred debug/outputs/ \
  --io dir \
  --score_config configs/debug.yaml \
  --output_file debug_results.json
Use soundfile format when:✅ Integrating with non-Kaldi tools
✅ You have simple file lists
✅ No preprocessing needed
# Convert from simple list to soundfile format
cat file_list.txt | awk '{print NR, $0}' > files.txt

versa-scorer \
  --pred files.txt \
  --io soundfile \
  --score_config configs/simple.yaml \
  --output_file results.json
For very large datasets, use kaldi with archives:
# Create archive (reduces file count)
copy-vector ark:- ark:audio.ark < wav.scp

# Reference archive in SCP
echo "utt1 ark:audio.ark:0" > archive.scp

versa-scorer \
  --pred archive.scp \
  --io kaldi \
  --score_config configs/large.yaml \
  --output_file results.json

Examples

Example 1: Kaldi Format with Processing

# Create SCP with sox processing
cat > pred.scp << EOF
utt1 sox samples/raw1.wav -r 16000 -c 1 -t wav - |
utt2 sox samples/raw2.wav -r 16000 -c 1 -t wav - |
EOF

# Run evaluation
versa-scorer \
  --pred pred.scp \
  --io kaldi \
  --score_config egs/demo/tts.yaml \
  --output_file results.json

Example 2: Soundfile Format

# Create simple file list
cat > pred.txt << EOF
utt1 /data/outputs/sample1.wav
utt2 /data/outputs/sample2.wav
utt3 /data/outputs/sample3.wav
EOF

cat > gt.txt << EOF
utt1 /data/references/sample1.wav
utt2 /data/references/sample2.wav
utt3 /data/references/sample3.wav
EOF

# Run evaluation
versa-scorer \
  --pred pred.txt \
  --gt gt.txt \
  --io soundfile \
  --score_config egs/speech.yaml \
  --output_file results.json

Example 3: Directory Format

# Organize files in directories
mkdir -p outputs references
cp generated/*.wav outputs/
cp ground_truth/*.wav references/

# Run evaluation
versa-scorer \
  --pred outputs/ \
  --gt references/ \
  --io dir \
  --score_config egs/demo/se.yaml \
  --output_file results.json

Troubleshooting

Problem: Used pipe (|) with soundfile formatSolution: Switch to kaldi format:
versa-scorer --pred files.scp --io kaldi ...
Problem: Directory is empty or contains unsupported formatsSolution:
  • Check directory path
  • Verify audio file extensions
  • Try using SCP format instead
Problem: Missing reference files for some predictionsSolution:
  • Ensure all predicted IDs have matching reference IDs
  • Check for typos in utterance IDs
  • Use --no_match if references aren’t needed
Problem: VERSA isn’t finding your audio filesSolution:
  • Verify file extensions are standard audio formats
  • Check file permissions
  • Try creating an explicit SCP file instead

Next Steps

Metric Types

Learn about the four metric categories

Configuration

Configure metrics in YAML

Quickstart

Run your first evaluation

CLI Usage

Complete CLI documentation

Build docs developers (and LLMs) love