PESQ (Perceptual Evaluation of Speech Quality)
Parameters
Predicted/degraded audio signal (1D array)
Ground truth/reference audio signal (1D array)
Sampling rate in Hz. Supported rates:
- 8000 Hz: Narrowband (NB) mode
- 16000 Hz: Wideband (WB) mode
- Other rates: Automatically resampled to 8 kHz or 16 kHz
Returns
PESQ score ranging from -0.5 to 4.5 (higher is better)
- Returns 0.0 if calculation fails (e.g., due to silence)
Usage Example
STOI (Short-Time Objective Intelligibility)
Parameters
Predicted/degraded audio signal (1D array)
Ground truth/reference audio signal (1D array)
Sampling rate in Hz (any standard audio sampling rate)
Returns
STOI score ranging from 0 to 1 (higher is better)
- Represents speech intelligibility correlation
Usage Example
Extended STOI (ESTOI)
Parameters
Predicted/degraded audio signal (1D array)
Ground truth/reference audio signal (1D array)
Sampling rate in Hz (any standard audio sampling rate)
Returns
Extended STOI score ranging from 0 to 1 (higher is better)
- Better performance with temporal modulations
Usage Example
Installation
Metric Comparison
| Metric | Range | Focus | Best For |
|---|---|---|---|
| PESQ | -0.5 to 4.5 | Perceptual quality | VoIP, codec evaluation |
| STOI | 0 to 1 | Intelligibility | Speech enhancement, hearing aids |
| ESTOI | 0 to 1 | Extended intelligibility | Modulated noise scenarios |
Technical Notes
Length Handling: If signal lengths differ, both metrics automatically use the shorter length for comparison.
PESQ Modes:
- Narrowband (8 kHz): For telephone bandwidth (300-3400 Hz)
- Wideband (16 kHz): For modern VoIP and codecs
STOI vs ESTOI: Extended STOI provides better correlation with intelligibility in scenarios with temporal envelope modulations.
Use Cases
PESQ
- VoIP quality assessment
- Audio codec evaluation
- Network transmission quality
- Speech enhancement validation
STOI/ESTOI
- Speech intelligibility prediction
- Hearing aid performance
- Noise suppression evaluation
- Room acoustics assessment