Configure audio preprocessing and noise reduction for better transcription accuracy
PreprocessOptions controls how Tafrigh preprocesses your audio before splitting and transcription. Proper preprocessing can dramatically improve transcription accuracy by reducing background noise, isolating voice frequencies, and enhancing speech clarity.
Frequency in Hz for the high-pass filter.Attenuates frequencies below this cutoff, removing low-frequency noise like rumble, hum, or wind. Human speech fundamental frequencies typically range from 85 Hz (male) to 255 Hz (female).Set to null to disable the high-pass filter entirely.Typical values:
Frequency in Hz for the low-pass filter.Allows frequencies below this cutoff while attenuating higher frequencies. Removes high-frequency noise like hiss or electronic interference. Speech intelligibility frequencies are typically below 4000 Hz.Set to null to disable the low-pass filter entirely.Typical values:
Time in seconds to begin FFT-based noise reduction.This marks the start of the noise profile sampling period. The denoiser analyzes this portion of audio to learn what “noise” sounds like. Must be used together with afftdnStop.Set to null to disable FFT-based denoising entirely.
The audio between afftdnStart and afftdnStop should contain only noise, no speech. Choose a segment with background noise only.
Time in seconds to end FFT-based noise reduction sampling.This marks the end of the noise profile sampling period. The denoiser uses the audio between afftdnStart and afftdnStop to build a noise profile. Must be used together with afftdnStart.Set to null to disable FFT-based denoising entirely.
Noise floor parameter in dB for FFT-based denoising.Controls the threshold for what the denoiser considers “noise.” Lower values (more negative) are more aggressive at removing noise but may affect speech quality.Set to null to disable FFT-based denoising entirely.Typical values:
Light noise reduction: -15 to -10 dB
Moderate noise reduction: -20 to -25 dB (default: -20 dB)
Enable dialogue enhancement to boost speech clarity.Enhances mid-range frequencies (typically 1-4 kHz) where human speech is most prominent. Makes dialogue easier to understand and improves transcription accuracy.Set to false to disable dialogue enhancement.