Overview
VERSA provides built-in support for parallel processing on HPC clusters using Slurm. Thelaunch_slurm.sh script automatically splits your dataset and distributes evaluation jobs across GPU and CPU nodes.
Prerequisites
- Access to a Slurm-managed compute cluster
- VERSA installed on all compute nodes
- Shared filesystem accessible from all nodes
- Configured Slurm partitions for GPU/CPU jobs
Quick Start
Script Parameters
Required Arguments
Path to the prediction wav.scp file containing audio files to evaluate.Format: Each line contains
<utterance_id> <audio_path>Path to the ground truth wav.scp file for reference-based metrics.Use
"None" (as a string) if only computing reference-free metrics.Directory where results and logs will be stored.The script automatically creates subdirectories for organization.
Number of chunks to split the dataset into.Each chunk is processed as a separate Slurm job. Choose based on dataset size and available resources.
Optional Arguments
Run only CPU-based metrics.Disables GPU job submission. Useful when GPU resources are unavailable or for CPU-only metrics.
Run only GPU-based metrics.Disables CPU job submission. Use when evaluating only GPU-accelerated metrics.
Path to text file with transcriptions or descriptions.Format: Each line contains
<utterance_id> <text_content>Required for text-dependent metrics like WER.Environment Variables
Customize resource allocation and cluster configuration:Partition Configuration
Time Limits
Resource Allocation
Additional Options
Usage Examples
Basic Evaluation
Evaluate both prediction and reference audio:Reference-Free Evaluation
Evaluate only prediction audio without reference:CPU-Only Processing
Run metrics that don’t require GPU:GPU-Only Processing
Run only GPU-accelerated metrics:With Text References
Include transcriptions for WER and text-based metrics:Custom Resource Configuration
Specify GPU type and increase resources:Workflow Details
Data splitting
The script splits input files into equal chunks:Creates files:
predictions.scp_000, predictions.scp_001, …, predictions.scp_009Directory Structure
The script creates the following structure inscore_dir:
Monitoring Jobs
Check job status
View real-time logs
Check for errors
Cancel all jobs
Troubleshooting
Jobs fail immediately
Jobs fail immediately
Check the error logs in Common issues:
score_dir/logs/:- Incorrect partition names
- Insufficient resources requested
- Missing dependencies on compute nodes
- Incorrect file paths (must be absolute or relative to job working directory)
Out of memory errors
Out of memory errors
Increase memory allocation:Or reduce the number of concurrent metrics in your config files.
Jobs timeout
Jobs timeout
Increase time limits:Or split data into more chunks to reduce per-job processing time.
GPU not detected
GPU not detected
Verify GPU resources:Check the partition has GPUs available and your account has access.
File not found errors
File not found errors
Ensure all paths are accessible from compute nodes:Use absolute paths or ensure relative paths work from the job submission directory.
Performance Optimization
Balancing GPU vs CPU
- GPU metrics: UTMOS, NISQA, speaker similarity, neural network-based metrics
- CPU metrics: PESQ, STOI, signal processing metrics
egs/universa_prepare/gpu_subset.yaml
egs/universa_prepare/cpu_subset.yaml
Advanced Configuration
Custom Slurm Scripts
Modifyegs/run_gpu.sh and egs/run_cpu.sh to customize the evaluation command:
egs/run_gpu.sh
Multiple Metric Configurations
Run different metric sets in parallel:The script is designed for flexibility. Modify environment variables and Slurm parameters to match your cluster’s specific configuration and policies.