Skip to main content
PESQ and STOI are reference-based metrics for evaluating speech quality and intelligibility. Both require a clean reference signal for comparison.

PESQ (Perceptual Evaluation of Speech Quality)

pesq_metric(
    pred_x,
    gt_x,
    fs
)

Parameters

pred_x
numpy.ndarray
required
Predicted/degraded audio signal (1D array)
gt_x
numpy.ndarray
required
Ground truth/reference audio signal (1D array)
fs
int
required
Sampling rate in Hz. Supported rates:
  • 8000 Hz: Narrowband (NB) mode
  • 16000 Hz: Wideband (WB) mode
  • Other rates: Automatically resampled to 8 kHz or 16 kHz

Returns

pesq
float
PESQ score ranging from -0.5 to 4.5 (higher is better)
  • Returns 0.0 if calculation fails (e.g., due to silence)

Usage Example

import numpy as np
from versa import pesq_metric

# Load audio signals
reference = np.random.random(16000)  # Replace with actual reference
degraded = np.random.random(16000)   # Replace with actual degraded audio
fs = 16000

# Calculate PESQ
result = pesq_metric(
    pred_x=degraded,
    gt_x=reference,
    fs=fs
)

print(f"PESQ Score: {result['pesq']:.2f}")

STOI (Short-Time Objective Intelligibility)

stoi_metric(
    pred_x,
    gt_x,
    fs
)

Parameters

pred_x
numpy.ndarray
required
Predicted/degraded audio signal (1D array)
gt_x
numpy.ndarray
required
Ground truth/reference audio signal (1D array)
fs
int
required
Sampling rate in Hz (any standard audio sampling rate)

Returns

stoi
float
STOI score ranging from 0 to 1 (higher is better)
  • Represents speech intelligibility correlation

Usage Example

import numpy as np
from versa import stoi_metric

# Load audio signals
reference = np.random.random(16000)  # Replace with actual reference
degraded = np.random.random(16000)   # Replace with actual degraded audio
fs = 16000

# Calculate STOI
result = stoi_metric(
    pred_x=degraded,
    gt_x=reference,
    fs=fs
)

print(f"STOI Score: {result['stoi']:.3f}")

Extended STOI (ESTOI)

estoi_metric(
    pred_x,
    gt_x,
    fs
)

Parameters

pred_x
numpy.ndarray
required
Predicted/degraded audio signal (1D array)
gt_x
numpy.ndarray
required
Ground truth/reference audio signal (1D array)
fs
int
required
Sampling rate in Hz (any standard audio sampling rate)

Returns

estoi
float
Extended STOI score ranging from 0 to 1 (higher is better)
  • Better performance with temporal modulations

Usage Example

import numpy as np
from versa import estoi_metric

# Load audio signals
reference = np.random.random(16000)  # Replace with actual reference
degraded = np.random.random(16000)   # Replace with actual degraded audio
fs = 16000

# Calculate Extended STOI
result = estoi_metric(
    pred_x=degraded,
    gt_x=reference,
    fs=fs
)

print(f"ESTOI Score: {result['estoi']:.3f}")

Installation

pip install pesq

Metric Comparison

MetricRangeFocusBest For
PESQ-0.5 to 4.5Perceptual qualityVoIP, codec evaluation
STOI0 to 1IntelligibilitySpeech enhancement, hearing aids
ESTOI0 to 1Extended intelligibilityModulated noise scenarios

Technical Notes

Length Handling: If signal lengths differ, both metrics automatically use the shorter length for comparison.
PESQ Modes:
  • Narrowband (8 kHz): For telephone bandwidth (300-3400 Hz)
  • Wideband (16 kHz): For modern VoIP and codecs
STOI vs ESTOI: Extended STOI provides better correlation with intelligibility in scenarios with temporal envelope modulations.
PESQ Error Handling: Returns 0.0 if calculation fails, typically due to silence or invalid audio. Always check for warnings in logs.

Use Cases

PESQ

  • VoIP quality assessment
  • Audio codec evaluation
  • Network transmission quality
  • Speech enhancement validation

STOI/ESTOI

  • Speech intelligibility prediction
  • Hearing aid performance
  • Noise suppression evaluation
  • Room acoustics assessment

Build docs developers (and LLMs) love