Skip to main content
Fréchet Audio Distance (FAD) evaluates the quality and diversity of generated audio by comparing embedding distributions between baseline and evaluation datasets.

Overview

FAD measures the distance between two audio distributions using the Fréchet distance in a learned embedding space. Lower FAD scores indicate better quality and diversity matching the baseline distribution.

fad_setup

Initialize FAD scoring system with baseline dataset and embedding model.
from versa import fad_setup

fad_info = fad_setup(
    baseline="path/to/baseline.scp",
    fad_embedding="default",
    cache_dir="versa_cache/fad",
    use_inf=True,
    io="kaldi"
)
baseline
str
required
Path to baseline audio dataset. Can be a Kaldi-style SCP file or directory path depending on io parameter
fad_embedding
str
default:"default"
Embedding model to use for feature extraction. See FADTK documentation for available models
cache_dir
str
default:"versa_cache/fad"
Directory to cache computed embeddings for faster repeated evaluation
use_inf
bool
default:"true"
Whether to use infinite FAD calculation (recommended for different dataset sizes)
io
str
default:"kaldi"
Input/output format for audio files. Options: "kaldi" (SCP format) or other supported formats
fad_info
dict
Dictionary containing:
  • module: FAD calculation module
  • baseline: Baseline dataset path
  • cache_dir: Cache directory path
  • use_inf: Infinite FAD flag
  • io: I/O format
  • embedding: Embedding model name
Requires FADTK installation. Install using tools/install_fadtk.sh or follow FADTK documentation.

fad_scoring

Calculate FAD score between baseline and evaluation datasets.
from versa import fad_setup, fad_scoring

# Setup FAD
fad_info = fad_setup(
    baseline="baseline_audio.scp",
    cache_dir="versa_cache/fad"
)

# Calculate score
result = fad_scoring(
    pred_x="generated_audio.scp",
    fad_info=fad_info,
    key_info="fad"
)

print(f"FAD Score: {result['fad_overall']}")
pred_x
str
required
Path to evaluation/generated audio dataset (same format as baseline)
fad_info
dict
required
FAD configuration dictionary from fad_setup()
key_info
str
default:"fad"
Prefix for result dictionary keys
return
dict
Dictionary containing:
  • {key_info}_overall (float): FAD score
  • {key_info}_r2 (float): R² value (only when use_inf=True)

Usage Examples

from versa import fad_setup, fad_scoring

# Initialize with baseline dataset
fad_info = fad_setup(
    baseline="data/baseline_audio.scp",
    fad_embedding="default",
    cache_dir="cache/fad"
)

# Evaluate generated audio
result = fad_scoring(
    pred_x="data/generated_audio.scp",
    fad_info=fad_info
)

print(f"FAD Score: {result['fad_overall']:.4f}")
print(f"R² Value: {result['fad_r2']:.4f}")

Understanding FAD Scores

  • Lower is better: FAD measures distribution distance
  • FAD = 0: Perfect match (identical distributions)
  • FAD < 1: Excellent quality and diversity
  • FAD 1-5: Good quality
  • FAD > 10: Significant distribution mismatch
Scores depend on the domain and baseline dataset.
  • Infinite FAD (use_inf=True): Recommended when baseline and evaluation have different sizes. Provides more stable estimates.
  • Standard FAD (use_inf=False): Requires equal-sized datasets. Faster but less robust.
The function automatically uses infinite FAD when dataset sizes differ.
FAD uses pre-trained models to extract audio features:
  • Different models may give different absolute scores
  • Use the same embedding model for fair comparison
  • See FADTK models for options
Embeddings are cached to disk:
  • First run: Computes and caches embeddings for both datasets
  • Subsequent runs: Loads cached embeddings (much faster)
  • Cache structure:
    • {cache_dir}/baseline/: Baseline embeddings
    • {cache_dir}/eval/: Evaluation embeddings
Delete cache directory to recompute embeddings.

File Format

Kaldi SCP Format

When using io="kaldi", provide SCP (script) files:
utt001 /path/to/audio001.wav
utt002 /path/to/audio002.wav
utt003 /path/to/audio003.wav
Each line contains:
  • Utterance ID: Unique identifier
  • File Path: Absolute or relative path to audio file

Technical Details

def fad_setup(
    baseline,
    fad_embedding="default",
    cache_dir="versa_cache/fad",
    use_inf=True,
    io="kaldi",
):
    if get_model is None or FrechetAudioDistance is None:
        raise ModuleNotFoundError(
            "FADTK is not installed. Please install it following `tools/install_fadtk.sh`"
        )
    # get model
    model = get_model(fad_embedding)

    # setup fad object
    fad = FrechetAudioDistance(ml=model, load_model=True)

    return {
        "module": fad,
        "baseline": baseline,
        "cache_dir": cache_dir,
        "use_inf": use_inf,
        "io": io,
        "embedding": fad_embedding,
    }
FAD calculation requires significant computational resources for large datasets. Use caching to avoid recomputing embeddings.

Build docs developers (and LLMs) love