Overview
The input format is controlled by the--io flag:
kaldi
Kaldi-style SCP files (default)
soundfile
Simple file path lists
dir
Directory scanning
Kaldi Format (Default)
The Kaldi format uses.scp (script) files that map utterance IDs to audio file paths or processing pipelines.
Basic File Paths
Simple ID-to-path mapping:This is the default format. If you don’t specify
--io, VERSA assumes kaldi.Pipeline Format
Kaldi supports processing pipelines with pipes (|):
Pipeline support requires the
kaldi IO interface. The soundfile interface does not support pipes.Archive Format
Point to Kaldi archives:When to Use Kaldi Format
Integration with Kaldi Workflows
Integration with Kaldi Workflows
If you’re already using Kaldi for speech processing, use this format to maintain compatibility:
On-the-fly Processing
On-the-fly Processing
Apply audio processing during evaluation:
Archive Storage
Archive Storage
Access files stored in Kaldi archives efficiently:
Creating Kaldi SCP Files
- From Directory
- With Processing
- From File List
Generate SCP from a directory of audio files:Result:
Soundfile Format
A simplified format using plain text file lists without pipeline support.Format Specification
Usage
When to Use Soundfile Format
Simple File Lists
When you have straightforward file paths without processing
No Kaldi Dependency
When you want to avoid Kaldi-specific features
Quick Evaluation
For rapid testing without pipeline overhead
Compatibility
When integrating with non-Kaldi tools
Creating Soundfile Lists
The format is identical to basic Kaldi SCP, but processed differently internally. Use
soundfile if you don’t need pipeline features.Directory Format
Automatically discover and process all audio files in a directory.Usage
When using
--io dir, the --pred and --gt arguments should be directory paths, not file paths.How It Works
VERSA will:- Scan the directory recursively
- Find all audio files (wav, flac, mp3, etc.)
- Generate utterance IDs from filenames
- Create an in-memory file mapping
File Discovery
VERSA searches for common audio extensions:- Supported Formats
- Directory Structure
.wav- Waveform Audio.flac- Free Lossless Audio Codec.mp3- MPEG Audio Layer 3.ogg- Ogg Vorbis.opus- Opus Interactive Audio Codec.m4a- MPEG-4 Audio- And more via soundfile/librosa
Utterance ID Generation
IDs are derived from file paths:When to Use Directory Format
Quick Evaluation
Quick Evaluation
Evaluate all files in a directory without creating SCP files:
Exploratory Analysis
Exploratory Analysis
Quickly analyze a collection of audio files:
Prototyping
Prototyping
During development when file lists aren’t established:
Matching Reference Files
How VERSA Matches Files
For dependent metrics, VERSA matches predicted and reference files by utterance ID:VERSA only evaluates IDs present in the prediction file. Extra IDs in ground truth are ignored.
Disabling Matching
For non-match metrics only:Comparison Table
| Feature | kaldi | soundfile | dir |
|---|---|---|---|
| Default format | ✅ | ❌ | ❌ |
| Requires SCP file | ✅ | ✅ | ❌ |
| Pipeline support | ✅ | ❌ | ❌ |
| Archive support | ✅ | ❌ | ❌ |
| Auto-discovery | ❌ | ❌ | ✅ |
| Processing overhead | Low | Low | Medium |
| Setup complexity | Medium | Low | Minimal |
| Use case | Production | Simple lists | Quick tests |
Best Practices
Production Workflows
Production Workflows
Use kaldi format for production:✅ Consistent with speech processing tools
✅ Supports preprocessing pipelines
✅ Efficient with large datasets
✅ Well-documented format
✅ Supports preprocessing pipelines
✅ Efficient with large datasets
✅ Well-documented format
Development and Testing
Development and Testing
Use dir format for development:✅ No file preparation needed
✅ Quick iteration
✅ Easy to add/remove samples
✅ Quick iteration
✅ Easy to add/remove samples
Integration with Other Tools
Integration with Other Tools
Use soundfile format when:✅ Integrating with non-Kaldi tools
✅ You have simple file lists
✅ No preprocessing needed
✅ You have simple file lists
✅ No preprocessing needed
Handling Large Datasets
Handling Large Datasets
For very large datasets, use kaldi with archives:
Examples
Example 1: Kaldi Format with Processing
Example 2: Soundfile Format
Example 3: Directory Format
Troubleshooting
Error: Not supported wav.scp format
Error: Not supported wav.scp format
Problem: Used pipe (
|) with soundfile formatSolution: Switch to kaldi format:Error: Not found any generated audio files
Error: Not found any generated audio files
Problem: Directory is empty or contains unsupported formatsSolution:
- Check directory path
- Verify audio file extensions
- Try using SCP format instead
Warning: Groundtruth files less than generated files
Warning: Groundtruth files less than generated files
Problem: Missing reference files for some predictionsSolution:
- Ensure all predicted IDs have matching reference IDs
- Check for typos in utterance IDs
- Use
--no_matchif references aren’t needed
Files not being discovered in directory mode
Files not being discovered in directory mode
Problem: VERSA isn’t finding your audio filesSolution:
- Verify file extensions are standard audio formats
- Check file permissions
- Try creating an explicit SCP file instead
Next Steps
Metric Types
Learn about the four metric categories
Configuration
Configure metrics in YAML
Quickstart
Run your first evaluation
CLI Usage
Complete CLI documentation