Model Setup
Parameters
Path to the NISQA model checkpoint (.tar file). The checkpoint must contain:
- Model state dictionary
- Model arguments and configuration
- Architecture specification (NISQA, NISQA_DIM, or NISQA_DE)
Whether to use GPU for computation. Raises RuntimeError if GPU is not available but use_gpu=True
Returns
Loaded NISQA model with the following attributes:
model.args- Model configuration argumentsmodel.device- Device (“cuda” or “cpu”)- Model can be NISQA (basic), NISQA_DIM (dimensional), or NISQA_DE (double-ended)
Raises
ValueError: If model path is not provided or checkpoint is invalidRuntimeError: If GPU requested but not availableNotImplementedError: If model type is not recognized
Metric Calculation
Parameters
The NISQA model loaded from
nisqa_model_setup()Audio signal to evaluate (1D array)
Sampling rate of the input audio in Hz. Audio will be resampled to 48 kHz if needed
Returns
Dictionary containing NISQA scores with keys prefixed by
nisqa_. Typical metrics include:nisqa_mos- Overall MOS predictionnisqa_noi- Noisiness dimension (if NISQA_DIM model)nisqa_dis- Distortion dimension (if NISQA_DIM model)nisqa_col- Coloration dimension (if NISQA_DIM model)nisqa_loud- Loudness dimension (if NISQA_DIM model)
Usage Example
Model Variants
NISQA (Basic)
Predicts overall MOS score for speech quality.NISQA_DIM (Dimensional)
Predicts MOS along with perceptual dimensions:- Noisiness: Amount of background noise
- Distortion: Signal distortion level
- Coloration: Frequency response coloration
- Loudness: Perceived loudness
NISQA_DE (Double-Ended)
Double-ended variant that can use reference signal if available.Technical Details
Model Input: NISQA expects audio at 48 kHz sampling rate. Audio is automatically resampled if provided at a different rate.
Model Architecture: NISQA uses CNN features with temporal dependencies captured via self-attention or LSTM layers.
Model Download
Download pre-trained NISQA models from:Performance Characteristics
| Aspect | Details |
|---|---|
| Sampling Rate | 48 kHz (auto-resampled) |
| Input Type | Single-channel audio |
| Output Range | 1-5 (MOS scale) |
| Reference Required | No (non-intrusive) |
| Model Type | Deep neural network |