Input Formats

VERSA supports three different input formats for specifying audio files. Choose the format that best fits your workflow and data organization.

Overview

The input format is controlled by the --io flag:

kaldi

Kaldi-style SCP files (default)

soundfile

Simple file path lists

dir

Directory scanning

versa-scorer \
  --pred outputs.scp \
  --io kaldi  # or soundfile, or dir

Kaldi Format (Default)

The Kaldi format uses .scp (script) files that map utterance IDs to audio file paths or processing pipelines.

Basic File Paths

Simple ID-to-path mapping:

utterance_001 /path/to/audio/file1.wav
utterance_002 /path/to/audio/file2.wav
utterance_003 /path/to/audio/file3.flac

This is the default format. If you don’t specify --io, VERSA assumes kaldi.

Pipeline Format

Kaldi supports processing pipelines with pipes (|):

utterance_001 sox /path/to/file1.wav -t wav - |
utterance_002 ffmpeg -i /path/to/file2.mp3 -f wav - |
utterance_003 cat /path/to/file3.raw | sox -t raw -r 16000 -b 16 -e signed - -t wav - |

Pipeline support requires the kaldi IO interface. The soundfile interface does not support pipes.

Archive Format

Point to Kaldi archives:

utterance_001 ark:/path/to/archive.ark:12345
utterance_002 ark:/path/to/archive.ark:67890

When to Use Kaldi Format

Integration with Kaldi Workflows

If you’re already using Kaldi for speech processing, use this format to maintain compatibility:

# Use existing Kaldi wav.scp files
versa-scorer \
  --pred exp/generated/wav.scp \
  --gt data/test/wav.scp \
  --score_config configs/speech.yaml \
  --output_file results.json

On-the-fly Processing

Apply audio processing during evaluation:

# Convert sample rate on the fly
utt1 sox input.wav -r 16000 -t wav - |

# Extract channel
utt2 sox stereo.wav -c 1 -t wav - |

# Apply volume normalization
utt3 sox input.wav -t wav - norm |

Archive Storage

Access files stored in Kaldi archives efficiently:

utt_001 ark:data/train.ark:0
utt_002 ark:data/train.ark:44100
utt_003 ark:data/train.ark:88200

Creating Kaldi SCP Files

From Directory
With Processing
From File List

Generate SCP from a directory of audio files:

# Simple script to create wav.scp
for file in /path/to/audio/*.wav; do
  id=$(basename "$file" .wav)
  echo "${id} ${file}"
done > wav.scp

Result:

sample1 /path/to/audio/sample1.wav
sample2 /path/to/audio/sample2.wav
sample3 /path/to/audio/sample3.wav

Include sox processing:

for file in /path/to/audio/*.mp3; do
  id=$(basename "$file" .mp3)
  echo "${id} sox ${file} -r 16000 -c 1 -t wav - |"
done > wav.scp

Result:

sample1 sox /path/to/audio/sample1.mp3 -r 16000 -c 1 -t wav - |
sample2 sox /path/to/audio/sample2.mp3 -r 16000 -c 1 -t wav - |

Convert a file list to SCP format:

# input: files.txt with one path per line
# output: wav.scp

cat files.txt | while read file; do
  id=$(basename "$file" | sed 's/\.[^.]*$//')
  echo "${id} ${file}"
done > wav.scp

Soundfile Format

A simplified format using plain text file lists without pipeline support.

Format Specification

utterance_001 /path/to/audio/file1.wav
utterance_002 /path/to/audio/file2.wav
utterance_003 /path/to/audio/file3.flac

The soundfile format does not support pipes (|). If your file path ends with |, you’ll get an error:

ValueError: Not supported wav.scp format. Set IO interface to kaldi

Usage

versa-scorer \
  --pred outputs.txt \
  --gt references.txt \
  --io soundfile \
  --score_config configs/speech.yaml \
  --output_file results.json

When to Use Soundfile Format

Simple File Lists

When you have straightforward file paths without processing

No Kaldi Dependency

When you want to avoid Kaldi-specific features

Quick Evaluation

For rapid testing without pipeline overhead

Compatibility

When integrating with non-Kaldi tools

Creating Soundfile Lists

# Simple file list
ls /path/to/audio/*.wav | awk '{print NR, $0}' > files.txt

# With custom IDs
for file in /path/to/audio/*.wav; do
  id=$(basename "$file" .wav)
  echo "${id} ${file}"
done > files.txt

The format is identical to basic Kaldi SCP, but processed differently internally. Use soundfile if you don’t need pipeline features.

Directory Format

Automatically discover and process all audio files in a directory.

Usage

versa-scorer \
  --pred /path/to/generated_audio/ \
  --gt /path/to/reference_audio/ \
  --io dir \
  --score_config configs/speech.yaml \
  --output_file results.json

When using --io dir, the --pred and --gt arguments should be directory paths, not file paths.

How It Works

VERSA will:

Scan the directory recursively
Find all audio files (wav, flac, mp3, etc.)
Generate utterance IDs from filenames
Create an in-memory file mapping

File Discovery

VERSA searches for common audio extensions:

Supported Formats
Directory Structure

.wav - Waveform Audio
.flac - Free Lossless Audio Codec
.mp3 - MPEG Audio Layer 3
.ogg - Ogg Vorbis
.opus - Opus Interactive Audio Codec
.m4a - MPEG-4 Audio
And more via soundfile/librosa

/path/to/audio/
├── speaker1/
│   ├── utt001.wav
│   └── utt002.wav
├── speaker2/
│   ├── utt001.wav
│   └── utt002.wav
└── test_samples/
    └── sample.flac

All files are discovered regardless of subdirectory depth.

Utterance ID Generation

IDs are derived from file paths:

# File: /path/to/audio/speaker1/utt001.wav
# ID: speaker1_utt001

# File: /path/to/audio/test.flac  
# ID: test

Ensure filenames are unique across all subdirectories, or IDs may collide.

When to Use Directory Format

Quick Evaluation

Evaluate all files in a directory without creating SCP files:

# Evaluate entire output directory
versa-scorer \
  --pred experiments/run_001/outputs/ \
  --gt data/references/ \
  --io dir \
  --score_config configs/quick.yaml \
  --output_file results.json

Exploratory Analysis

Quickly analyze a collection of audio files:

# Analyze a dataset
versa-scorer \
  --pred dataset/generated/ \
  --io dir \
  --score_config configs/independent_metrics.yaml \
  --output_file analysis.json

Prototyping

During development when file lists aren’t established:

# Test new model outputs
versa-scorer \
  --pred model_v2/samples/ \
  --io dir \
  --score_config configs/tts.yaml \
  --output_file model_v2_results.json

Matching Reference Files

How VERSA Matches Files

For dependent metrics, VERSA matches predicted and reference files by utterance ID:

# pred.scp
utterance_001 /pred/file1.wav
utterance_002 /pred/file2.wav

# gt.scp  
utterance_001 /gt/file1.wav
utterance_002 /gt/file2.wav
utterance_003 /gt/file3.wav  # This won't cause an error

VERSA only evaluates IDs present in the prediction file. Extra IDs in ground truth are ignored.

Disabling Matching

For non-match metrics only:

versa-scorer \
  --pred outputs.scp \
  --gt references.scp \
  --no_match \
  --score_config configs/nonmatch_only.yaml \
  --output_file results.json

Using --no_match with dependent metrics will cause those metrics to be skipped.

Comparison Table

Feature	kaldi	soundfile	dir
Default format	✅	❌	❌
Requires SCP file	✅	✅	❌
Pipeline support	✅	❌	❌
Archive support	✅	❌	❌
Auto-discovery	❌	❌	✅
Processing overhead	Low	Low	Medium
Setup complexity	Medium	Low	Minimal
Use case	Production	Simple lists	Quick tests

Best Practices

Production Workflows

Use kaldi format for production:✅ Consistent with speech processing tools
✅ Supports preprocessing pipelines
✅ Efficient with large datasets
✅ Well-documented format

versa-scorer \
  --pred exp/system_a/wav.scp \
  --gt data/test_clean/wav.scp \
  --score_config configs/production.yaml \
  --output_file results/system_a.json

Development and Testing

Use dir format for development:✅ No file preparation needed
✅ Quick iteration
✅ Easy to add/remove samples

versa-scorer \
  --pred debug/outputs/ \
  --io dir \
  --score_config configs/debug.yaml \
  --output_file debug_results.json

Integration with Other Tools

Use soundfile format when:✅ Integrating with non-Kaldi tools
✅ You have simple file lists
✅ No preprocessing needed

# Convert from simple list to soundfile format
cat file_list.txt | awk '{print NR, $0}' > files.txt

versa-scorer \
  --pred files.txt \
  --io soundfile \
  --score_config configs/simple.yaml \
  --output_file results.json

Handling Large Datasets

For very large datasets, use kaldi with archives:

# Create archive (reduces file count)
copy-vector ark:- ark:audio.ark < wav.scp

# Reference archive in SCP
echo "utt1 ark:audio.ark:0" > archive.scp

versa-scorer \
  --pred archive.scp \
  --io kaldi \
  --score_config configs/large.yaml \
  --output_file results.json

Examples

Example 1: Kaldi Format with Processing

# Create SCP with sox processing
cat > pred.scp << EOF
utt1 sox samples/raw1.wav -r 16000 -c 1 -t wav - |
utt2 sox samples/raw2.wav -r 16000 -c 1 -t wav - |
EOF

# Run evaluation
versa-scorer \
  --pred pred.scp \
  --io kaldi \
  --score_config egs/demo/tts.yaml \
  --output_file results.json

Example 2: Soundfile Format

# Create simple file list
cat > pred.txt << EOF
utt1 /data/outputs/sample1.wav
utt2 /data/outputs/sample2.wav
utt3 /data/outputs/sample3.wav
EOF

cat > gt.txt << EOF
utt1 /data/references/sample1.wav
utt2 /data/references/sample2.wav
utt3 /data/references/sample3.wav
EOF

# Run evaluation
versa-scorer \
  --pred pred.txt \
  --gt gt.txt \
  --io soundfile \
  --score_config egs/speech.yaml \
  --output_file results.json

Example 3: Directory Format

# Organize files in directories
mkdir -p outputs references
cp generated/*.wav outputs/
cp ground_truth/*.wav references/

# Run evaluation
versa-scorer \
  --pred outputs/ \
  --gt references/ \
  --io dir \
  --score_config egs/demo/se.yaml \
  --output_file results.json

Troubleshooting

Error: Not supported wav.scp format

Problem: Used pipe (|) with soundfile formatSolution: Switch to kaldi format:

versa-scorer --pred files.scp --io kaldi ...

Error: Not found any generated audio files

Problem: Directory is empty or contains unsupported formatsSolution:

Check directory path
Verify audio file extensions
Try using SCP format instead

Warning: Groundtruth files less than generated files

Problem: Missing reference files for some predictionsSolution:

Ensure all predicted IDs have matching reference IDs
Check for typos in utterance IDs
Use --no_match if references aren’t needed

Files not being discovered in directory mode

Problem: VERSA isn’t finding your audio filesSolution:

Verify file extensions are standard audio formats
Check file permissions
Try creating an explicit SCP file instead

Next Steps

Metric Types

Learn about the four metric categories

Configuration

Configure metrics in YAML

Quickstart

Run your first evaluation

CLI Usage

Complete CLI documentation

Get Started

Core Concepts

Usage Guides

Metrics Reference

Advanced

​Overview

kaldi

soundfile

dir

​Kaldi Format (Default)

​Basic File Paths

​Pipeline Format

​Archive Format

​When to Use Kaldi Format

​Creating Kaldi SCP Files

​Soundfile Format

​Format Specification

​Usage

​When to Use Soundfile Format

Simple File Lists

No Kaldi Dependency

Quick Evaluation

Compatibility

​Creating Soundfile Lists

​Directory Format

​Usage

​How It Works

​File Discovery

​Utterance ID Generation

​When to Use Directory Format

​Matching Reference Files

​How VERSA Matches Files

​Disabling Matching

​Comparison Table

​Best Practices

​Examples

​Example 1: Kaldi Format with Processing

​Example 2: Soundfile Format

​Example 3: Directory Format

​Troubleshooting

​Next Steps

Metric Types

Configuration

Quickstart

CLI Usage

Build docs developers (and LLMs) love

Overview

Kaldi Format (Default)

Basic File Paths

Pipeline Format

Archive Format

When to Use Kaldi Format

Creating Kaldi SCP Files

Soundfile Format

Format Specification

Usage

When to Use Soundfile Format

Creating Soundfile Lists

Directory Format

Usage

How It Works

File Discovery

Utterance ID Generation

When to Use Directory Format

Matching Reference Files

How VERSA Matches Files

Disabling Matching

Comparison Table

Best Practices

Examples

Example 1: Kaldi Format with Processing

Example 2: Soundfile Format

Example 3: Directory Format

Troubleshooting

Next Steps