Noise reduction

Noise reduction is a critical preprocessing step that can dramatically improve transcription accuracy. This guide explains each noise reduction technique in detail and helps you choose the right settings for your audio.

Overview

Tafrigh uses multiple noise reduction techniques that work together:

High-pass filtering: Removes low-frequency noise
Low-pass filtering: Removes high-frequency noise
FFT-based denoising: Learns and removes specific noise profiles
Dialogue enhancement: Boosts speech frequencies

You can enable, disable, or tune each technique independently.

High-pass filtering

A high-pass filter attenuates frequencies below a specified cutoff, removing low-frequency noise while preserving speech.

When to use it

Rumble: Microphone handling noise, traffic, machinery
Hum: Electrical interference (50/60 Hz AC hum)
Wind noise: Outdoor recordings
HVAC noise: Air conditioning, ventilation systems

How it works

Frequencies below the cutoff are progressively attenuated:

Fundamental frequencies of male voices: ~85-180 Hz
Fundamental frequencies of female voices: ~165-255 Hz
Speech intelligibility: mostly above 250 Hz

Setting the cutoff at 300 Hz removes noise while preserving most speech information.

Configuration

highpass

number | null

default:"300"

Cutoff frequency in Hz. Set to null to disable.

Examples

const transcript = await transcribe('audio.mp3', {
  preprocessOptions: {
    noiseReduction: {
      highpass: 100,  // Gentle filtering
    },
  },
});

Setting highpass too high (>500 Hz) may make voices sound thin or tinny by removing too much low-frequency content.

Low-pass filtering

A low-pass filter attenuates frequencies above a specified cutoff, removing high-frequency noise while preserving speech.

When to use it

Hiss: Tape hiss, analog noise
Electronic interference: Digital artifacts, radio interference
Sibilance: Excessive “s” and “sh” sounds
High-frequency artifacts: Compression artifacts, clipping

How it works

Frequencies above the cutoff are progressively attenuated:

Most speech intelligibility: below 3500 Hz
Consonant clarity: 2000-4000 Hz
Sibilants (s, sh, f): 4000-8000 Hz

Setting the cutoff at 3000 Hz preserves speech intelligibility while removing most high-frequency noise.

Configuration

lowpass

number | null

default:"3000"

Cutoff frequency in Hz. Set to null to disable.

Examples

const transcript = await transcribe('audio.mp3', {
  preprocessOptions: {
    noiseReduction: {
      lowpass: 4000,  // Gentle filtering
    },
  },
});

Setting lowpass too low (below 2000 Hz) may reduce speech clarity by removing important consonant frequencies.

FFT-based denoising

FFT (Fast Fourier Transform) denoising learns a noise profile from a sample of your audio and removes it from the entire file.

When to use it

Consistent background noise: AC units, fans, computers
Room tone: Ambient noise in a recording space
Electrical hum: Constant 50/60 Hz interference
White/pink noise: Analog recording noise

How it works

Sample the noise: You specify a time range (afftdnStart to afftdnStop) that contains only background noise
Learn the profile: The denoiser analyzes the frequency spectrum of the noise
Remove the noise: The learned profile is subtracted from the entire audio file
Threshold control: afftdn_nf sets the noise floor (how aggressive the removal is)

Finding a noise sample

You need a segment of your audio that contains only background noise, no speech:

Open your audio in a media player
Find a section before the speaker begins, during a long pause, or after they finish
Note the start and end timestamps (aim for 0.5-3 seconds)
Use these timestamps for afftdnStart and afftdnStop

The beginning of most recordings has a few seconds of room tone before anyone speaks. This is ideal for noise sampling.

Configuration

afftdnStart

number | null

default:"0"

Start time in seconds for noise sampling. Must be used with afftdnStop.

afftdnStop

number | null

default:"1.5"

End time in seconds for noise sampling. Must be used with afftdnStart.

afftdn_nf

number | null

default:"-20"

Noise floor in dB. Lower values are more aggressive. Must be used with afftdnStart and afftdnStop.

Examples

const transcript = await transcribe('audio.mp3', {
  preprocessOptions: {
    noiseReduction: {
      afftdnStart: 0,      // First second of audio
      afftdnStop: 1.5,     // Through 1.5 seconds
      afftdn_nf: -20,      // Moderate reduction
    },
  },
});

The noise sample must contain only noise. If speech is present, the denoiser will learn speech as “noise” and remove it from your entire recording.

Choosing the noise floor

The afftdn_nf parameter controls how aggressively noise is removed:

Light reduction (-15 to -10 dB): Removes obvious noise, preserves audio character
Moderate reduction (-20 to -25 dB): Good balance for most use cases
Aggressive reduction (-30 to -40 dB): Maximum noise removal, may affect speech quality

Start with -20 dB and adjust based on results. If speech sounds muffled or “underwater,” increase the value (less negative). If noise remains, decrease the value (more negative).

Dialogue enhancement

Dialogue enhancement boosts mid-range frequencies (1000-4000 Hz) where human speech is most prominent.

When to use it

Muffled recordings: Low-quality microphones, distance from speaker
Background music: When speech competes with music
Multiple speakers: Helps individual voices stand out
Generally recommended: Almost always improves transcription accuracy

How it works

The enhancement applies a frequency curve that:

Boosts 1000-2000 Hz (vowel clarity)
Boosts 2000-4000 Hz (consonant definition)
Leaves other frequencies unchanged

This makes speech more intelligible without affecting overall tonal balance.

Configuration

dialogueEnhance

boolean

default:"true"

Enable dialogue enhancement. Set to false to disable.

Examples

const transcript = await transcribe('audio.mp3', {
  preprocessOptions: {
    noiseReduction: {
      dialogueEnhance: true,  // Default value
    },
  },
});

Dialogue enhancement rarely has downsides. Leave it enabled unless you have a specific reason to disable it.

Complete examples

Clean studio recording

Minimal processing for high-quality audio:

const transcript = await transcribe('studio.mp3', {
  preprocessOptions: {
    noiseReduction: {
      highpass: 100,           // Gentle rumble removal
      lowpass: 4000,           // Preserve more frequencies
      afftdnStart: null,       // No FFT denoising needed
      afftdnStop: null,
      afftdn_nf: null,
      dialogueEnhance: true,   // Still helpful
    },
  },
});

Podcast with moderate noise

Balanced settings for typical podcast audio:

const transcript = await transcribe('podcast.mp3', {
  preprocessOptions: {
    noiseReduction: {
      highpass: 200,           // Remove low rumble
      lowpass: 3500,           // Remove hiss
      afftdnStart: 0,          // Sample first second
      afftdnStop: 1,
      afftdn_nf: -20,          // Moderate denoising
      dialogueEnhance: true,
    },
  },
});

Noisy field recording

Aggressive processing for challenging audio:

const transcript = await transcribe('field-recording.mp3', {
  preprocessOptions: {
    noiseReduction: {
      highpass: 300,           // Strong low-frequency filtering
      lowpass: 3000,           // Strong high-frequency filtering
      afftdnStart: 2.0,        // Longer noise sample
      afftdnStop: 4.0,
      afftdn_nf: -35,          // Aggressive denoising
      dialogueEnhance: true,
    },
  },
});

Telephone or low-quality audio

Telephone bandwidth simulation:

const transcript = await transcribe('phone-call.mp3', {
  preprocessOptions: {
    noiseReduction: {
      highpass: 300,
      lowpass: 3400,           // Telephone bandwidth
      afftdnStart: 0,
      afftdnStop: 1,
      afftdn_nf: -25,
      dialogueEnhance: true,
    },
  },
});

No preprocessing

Disable all noise reduction:

const transcript = await transcribe('audio.mp3', {
  preprocessOptions: {
    noiseReduction: null,  // Skip all preprocessing
  },
});

Troubleshooting

Speech sounds muffled or underwater

Problem: Over-aggressive noise reduction is removing speech frequencies. Solutions:

Increase afftdn_nf (make it less negative): -35 → -20 → -15
Widen filter ranges: highpass: 200 instead of 400, lowpass: 3500 instead of 3000
Check your noise sample doesn’t contain speech

Noise remains in the transcription

Problem: Noise reduction isn’t strong enough. Solutions:

Decrease afftdn_nf (make it more negative): -15 → -20 → -30
Narrow filter ranges: highpass: 400 instead of 200, lowpass: 2500 instead of 3500
Ensure your noise sample is 1-3 seconds long and representative
Check that noise is consistent (FFT denoising only works for consistent noise)

Voices sound robotic or have artifacts

Problem: Too much processing or poor noise sample. Solutions:

Choose a better noise sample (no speech, representative background noise)
Reduce denoising aggressiveness: afftdn_nf: -15 instead of -30
Disable FFT denoising entirely if noise sample is poor
Use only filters: set afftdnStart, afftdnStop, afftdn_nf to null

Low-quality audio after preprocessing

Problem: Filters are too restrictive. Solutions:

Widen the frequency range: highpass: 100, lowpass: 4000
Try minimal filtering: only dialogueEnhance: true
Test without preprocessing: noiseReduction: null

Monitoring preprocessing

Use callbacks to track preprocessing progress:

const transcript = await transcribe('audio.mp3', {
  preprocessOptions: {
    noiseReduction: {
      highpass: 300,
      lowpass: 3000,
      afftdnStart: 0,
      afftdnStop: 1.5,
      afftdn_nf: -20,
      dialogueEnhance: true,
    },
  },
  callbacks: {
    onPreprocessingStarted: async (filePath) => {
      console.log(`Starting noise reduction on: ${filePath}`);
    },
    onPreprocessingProgress: async (percent) => {
      process.stdout.write(`\rPreprocessing: ${percent}% complete`);
    },
    onPreprocessingFinished: async (filePath) => {
      console.log(`\nFinished preprocessing: ${filePath}`);
    },
  },
});

Best practices

Start with defaults: The default settings work well for most audio
Test incrementally: Change one parameter at a time to see its effect
Listen to your audio: Identify specific noise types to target
Choose good noise samples: 1-3 seconds of noise-only audio
Don’t over-process: More filtering isn’t always better
Preserve speech: When in doubt, be conservative
Document your settings: Save working configurations for similar audio

Noise reduction can’t fix everything. Severely damaged or low-quality audio may not transcribe well even with optimal settings.

Preprocess options

Full preprocessing configuration reference

Split options

Configure audio chunk splitting

Core Functions

Types

Configuration

​Overview

​High-pass filtering

​When to use it

​How it works

​Configuration

​Examples

​Low-pass filtering

​When to use it

​How it works

​Configuration

​Examples

​FFT-based denoising

​When to use it

​How it works

​Finding a noise sample

​Configuration

​Examples

​Choosing the noise floor

​Dialogue enhancement

​When to use it

​How it works

​Configuration

​Examples

​Complete examples

​Clean studio recording

​Podcast with moderate noise

​Noisy field recording

​Telephone or low-quality audio

​No preprocessing

​Troubleshooting

​Speech sounds muffled or underwater

​Noise remains in the transcription

​Voices sound robotic or have artifacts

​Low-quality audio after preprocessing

​Monitoring preprocessing

​Best practices

​Related

Preprocess options

Split options

Build docs developers (and LLMs) love

Overview

High-pass filtering

When to use it

How it works

Configuration

Examples

Low-pass filtering

When to use it

How it works

Configuration

Examples

FFT-based denoising

When to use it

How it works

Finding a noise sample

Configuration

Examples

Choosing the noise floor

Dialogue enhancement

When to use it

How it works

Configuration

Examples

Complete examples

Clean studio recording

Podcast with moderate noise

Noisy field recording

Telephone or low-quality audio

No preprocessing

Troubleshooting

Speech sounds muffled or underwater

Noise remains in the transcription

Voices sound robotic or have artifacts

Low-quality audio after preprocessing

Monitoring preprocessing

Best practices

Related