Preprocessing workflow
The preprocessing stage normalizes your audio file and applies optional noise reduction. This happens in theformatMedia function before any splitting occurs.
Noise reduction filters
By default, Tafrigh applies several audio filters to improve transcription accuracy:High-pass filter (300 Hz)
High-pass filter (300 Hz)
Removes low-frequency rumble and background noise below human speech range. Set
highpass: null to disable.Low-pass filter (3000 Hz)
Low-pass filter (3000 Hz)
Removes high-frequency hiss and noise above typical speech frequencies. Set
lowpass: null to disable.FFT-based denoising
FFT-based denoising
Analyzes the first 0-1.5 seconds to learn the noise profile, then removes similar patterns throughout the file. The
afftdn_nf parameter controls the noise floor threshold (default: -20 dB).Dialogue enhancement
Dialogue enhancement
Boosts midrange frequencies (where human speech lives) to make voices clearer. Enabled by default; set
dialogueEnhance: false to disable.Customizing preprocessing
You can tune or disable any filter:Audio splitting
After preprocessing, Tafrigh splits the audio into chunks at natural silence points. This step is critical because:- Wit.ai has duration limits for API requests
- Smaller chunks enable parallel processing across multiple API keys
- Splitting at silence prevents cutting words mid-pronunciation
Silence detection algorithm
ThesplitFileOnSilences function (from ffmpeg-simplified) detects pauses using two parameters:
silenceThreshold(default: -25 dB): Volume level considered “silent”. Lower values (e.g., -30 dB) detect softer pauses; higher values (e.g., -20 dB) only split on clear silence.silenceDuration(default: 0.1s): Minimum pause length to trigger a split. Increase for speakers with longer natural pauses.
Chunk boundaries
Each chunk respects these constraints:- Maximum duration:
chunkDuration(default: 60 seconds) - Minimum duration:
chunkMinThreshold(default: 0.9 seconds)
Chunks may be shorter than
chunkDuration if splitting at the exact boundary would cut a word. Tafrigh always prefers the last valid silence point.Example: Adjusting for podcasts
Podcasts often have background music and shorter pauses. Here’s a tuned configuration:Tracking progress
You can monitor both stages with callbacks:Chunk metadata
Each chunk includes timing information for accurate segment alignment:range to offset timestamps back to the original file (see mapWitResponseToSegment in /home/daytona/workspace/source/src/utils/mapping.ts:16-34).
All temporary files (preprocessed audio and chunks) are stored in a system temp directory and cleaned up automatically unless you set
preventCleanup: true.