Overview
The data statistics utility computes mean and standard deviation values for mel-spectrograms in your dataset. These statistics are essential for proper normalization during training and inference.Command Line Interface
matcha-data-stats
Arguments
Name of the YAML configuration file under
configs/data/Example: ljspeech.yaml, vctk.yamlBatch size for processing. Higher values are faster but use more memory
Force overwrite existing statistics file
Python API
compute_data_statistics()
Compute mean and standard deviation for mel-spectrograms.Parameters
DataLoader containing mel-spectrograms in batch[“y”]
Number of mel-spectrogram channels (typically 80)
Returns
Mean value across all mel-spectrogram frames and channels
Standard deviation across all mel-spectrogram frames and channels
Output Format
The command generates a JSON file named<config_name>.json:
Usage in Training
The computed statistics are used in the model configuration:Examples
Basic Usage
High-Speed Processing
Force Overwrite
Python Script Example
Mathematical Details
The statistics are computed as: Mean:Configuration Requirements
Your data config file must include:Filelist Format
The training filelist should contain:Performance Considerations
Memory Usage
- Batch size 256: ~8 GB GPU memory
- Batch size 512: ~16 GB GPU memory
- Batch size 128: ~4 GB GPU memory
Processing Time
For LJSpeech (~13,100 samples):- Batch size 256: ~2-3 minutes
- Batch size 512: ~1-2 minutes
- Batch size 128: ~4-5 minutes
Normalization in Model
The model uses these statistics for normalization:Error Handling
File Already Exists
-f flag or delete existing file
Config Not Found
Invalid Batch Size
Best Practices
- Use training set only: Compute statistics on training data, not validation/test
- Consistent preprocessing: Ensure mel-spectrogram extraction matches training
- Sufficient data: Use full training set for accurate statistics
- Save statistics: Store in version control with model configs
- Recompute when needed: Recalculate if you change mel-spectrogram parameters
Related Commands
- matcha-tts: Main synthesis command
- matcha-tts-get-durations: Extract phoneme durations
Source Reference
Implementation:matcha/utils/generate_data_statistics.py:25
Entry Point: setup.py:45