Installation
First, install the required visualization dependencies:pandas- Data manipulationmatplotlib- Static plottingseaborn- Statistical visualizationplotly- Interactive chartsnumpy- Numerical operations
Quick Start
Basic Result Analysis
Analyze VERSA output files:- Per-category metric summaries
- Statistical analysis (mean, std, min, max)
- Automatically discovered metrics
- Category organization
Export to CSV
Export results for further analysis:metrics_analysis.csv with comprehensive metric statistics.
show_result.py Options
Input JSONL file or directory containing JSONL files with evaluation results.
Display per-utterance results in the console.
Save per-utterance results to a new JSONL file.
Analyze only specific metrics.
Create visualization plots organized by metric categories.
Directory to save plots and exported files.
Export comprehensive metric analysis to CSV.
Display which metrics were discovered in the data.
Show detailed quality analysis and characteristics of each metric.
Analyze only specific metric categories.
Complete Visualization Workflow
Aggregate Results
Export evaluation results to CSV format:This creates
metrics_analysis.csv with all metric statistics.Convert to Tree Format
Transform CSV into hierarchical tree structure:The tree format organizes metrics by category for visualization.
Create Sunburst Chart
Generate interactive sunburst visualization:Opens an interactive chart in your browser showing metric hierarchy and values.To save as HTML:
Visualization Examples
Comprehensive Analysis
Analyze all aspects of your results:- Console output with metric discovery and analysis
analysis/metrics_analysis.csv- Statistical summaryanalysis/*_metrics_analysis.png- Category-wise plotsanalysis/correlation_heatmap.png- Metric correlation matrix
Category-Specific Analysis
Focus on specific metric categories:- Audio Quality
- Speech Enhancement
- Similarity Metrics
- ASR Metrics
Model Comparison
Compare multiple models side by side:Metric Categories
VERSA automatically organizes metrics into categories:Audio Quality
Audio Quality
MOS prediction and overall quality metrics:
- UTMOS, DNSMOS, PLCMOS
- NISQA, Sheet-SSQA
- NoResQA, WVMOS, SigMOS
- ScoreQ (reference and no-reference)
Speech Enhancement
Speech Enhancement
Signal improvement and intelligibility:
- PESQ, STOI, ESTOI
- SDR, SIR, SAR
- SI-SNR, CI-SDR
- ViSQOL, SQUIM
Psychoacoustic
Psychoacoustic
Perceptual and psychoacoustic measures:
- FWSEGSNR, WSS, CD
- CSIG, CBAK, COVL
- CSII (high/mid/low)
- NCM, LLR, SRMR
Similarity
Similarity
Identity and characteristic matching:
- Speaker similarity (cosine)
- Singer similarity
- Emotion similarity
Pitch & F0
Pitch & F0
Fundamental frequency analysis:
- F0 correlation
- F0 RMSE
- MCD (Mel-Cepstral Distortion)
ASR (WER/CER)
ASR (WER/CER)
Automatic speech recognition metrics:
- WER, CER (Whisper, ESPnet, OWSM)
- Error type breakdown (insert, delete, replace)
- ASR match error rate
Semantic
Semantic
Content and semantic metrics:
- Speech BERT score
- Speech BLEU
- Speech token distance
- CLAP score
Aesthetics
Aesthetics
AudioBox aesthetic attributes:
- Clarity (CE)
- Crispness (CU)
- Pleasantness (PC)
- Quality (PQ)
Advanced Analysis
Statistical Summary
Get detailed statistical insights:- Sample count
- Mean and median
- Standard deviation and range
- Coefficient of variation
- Variability assessment
- Quality notes
- Interpretation (higher/lower is better)
- Best and worst values
Correlation Analysis
Understand metric relationships:correlation_heatmap.png showing:
- Metric-to-metric correlations
- Redundant metrics (high correlation)
- Complementary metrics (low correlation)
Directory Processing
Process multiple JSONL files at once:*.jsonl*.json*.jl*.txt(with JSONL format)
Output Formats
Console Output
Summary table showing all metrics:CSV Export
Structured data for analysis:JSONL Output
Per-utterance detailed results:Visualization Gallery
Sunburst Chart
Hierarchical view of all metrics by category:
Features:
- Interactive drill-down by category
- Hover for detailed values
- Visual comparison of metric magnitudes
- Category-based color coding
Radar Chart
Multi-model comparison:
Features:
- Compare multiple models simultaneously
- Category or metric-specific views
- Normalized scales for fair comparison
- Interactive legend
Distribution Plots
Generated with--visualize:
- Histograms showing value distributions
- Box plots for outlier detection
- Category-organized subplots
- Statistical overlays (mean, std)
Correlation Heatmap
Metric relationship visualization:- Upper-triangular matrix layout
- Color-coded correlation strength
- Helps identify redundant metrics
- Discovers metric relationships
Best Practices
Regular Analysis
Analyze results after each experiment to track improvements and identify issues early.
Category Focus
Use
--categories to focus on relevant metrics for your use case rather than analyzing all metrics at once.Save Visualizations
Always use
--output-dir to save plots for presentations and reports.Compare Models
Use radar charts to compare multiple models on the same metric set for clear insights.
Troubleshooting
Missing visualization dependencies
Missing visualization dependencies
Install required packages:Or use the requirements file:
Empty plots or no metrics found
Empty plots or no metrics found
Verify your JSONL format:Each line should be valid JSON with numeric metric values.Use
--show-discovery to see which metrics were detected:Port forwarding for remote servers
Port forwarding for remote servers
If running on a remote server without display:
-
Save as HTML:
- Download HTML file to local machine
- Open in browser
Next Steps
- Learn about CLI usage to generate evaluation results
- Explore distributed evaluation for large-scale analysis
- Check metric configuration to understand available metrics