Transcription pipeline

The transcribe function orchestrates a multi-stage pipeline that transforms raw audio into structured transcript segments. Here’s how it works under the hood.

Pipeline overview

The complete workflow spans five stages:

Stage 1: Input validation

Before processing begins, Tafrigh validates your options (see validateTranscribeFileOptions in /home/daytona/workspace/source/src/utils/validation.ts):

export const transcribe = async (
    content: ReadStream | string, 
    options?: TranscribeOptions
) => {
    logger.info?.(`transcribe ${content} (${typeof content}) with options: ${JSON.stringify(options)}`);
    
    validateTranscribeFileOptions(options);
    // ...
};

You can pass three types of input:

Local file path: './audio.mp3'
Remote URL: 'https://example.com/audio.mp3'
Readable stream: createReadStream('audio.mp3') or ytdl(videoUrl)

Stage 2: Audio preprocessing

The formatMedia function converts your input to a normalized MP3 with optional noise reduction:

const filePath = await formatMedia(
    content,
    path.format({
        dir: outputDir,
        ext: '.mp3',
        name: Date.now().toString(),
    }),
    options?.preprocessOptions,
    options?.callbacks,
);

This stage:

Downloads remote URLs or reads streams
Applies high-pass/low-pass filters
Performs FFT-based noise reduction
Enhances dialogue frequencies
Saves the result to a temporary directory

The temporary directory is created with fs.mkdtemp('tafrigh') and uses your OS temp folder (see /home/daytona/workspace/source/src/index.ts:64).

Stage 3: Chunk creation

The preprocessed audio is split at silence points to create manageable chunks:

const chunkFiles = await splitFileOnSilences(
    filePath, 
    outputDir, 
    options?.splitOptions, 
    options?.callbacks
);

logger.debug?.(`Generated chunks: ${JSON.stringify(chunkFiles)}`);

if (chunkFiles.length === 0) {
    return [];
}

Each chunk includes timing metadata:

[
  { filename: '/tmp/tafrigh/chunk_0.mp3', range: { start: 0, end: 58.3 } },
  { filename: '/tmp/tafrigh/chunk_1.mp3', range: { start: 58.3, end: 120.7 } },
  // ...
]

If the entire file is silent or below the minimum threshold, chunkFiles will be empty and transcribe returns [] immediately.

Stage 4: Concurrent transcription

Chunks are processed in parallel based on available API keys and the concurrency option:

const { failures, transcripts } = await transcribeAudioChunks(chunkFiles, {
    callbacks: options?.callbacks,
    concurrency: options?.concurrency,
    retries: options?.retries,
});

The transcribeAudioChunks function (from /home/daytona/workspace/source/src/transcriber.ts:188-204) determines the optimal parallelism:

const apiKeyCount = getApiKeysCount();
const maxConcurrency = concurrency && concurrency <= apiKeyCount 
    ? concurrency 
    : apiKeyCount;

if (chunkFiles.length === 1 || concurrency === 1) {
    return transcribeAudioChunksInSingleThread(chunkFiles, callbacks, retries);
}

return transcribeAudioChunksWithConcurrency(
    chunkFiles, 
    maxConcurrency, 
    callbacks, 
    retries
);

Concurrency logic:

If you have 3 API keys and set concurrency: 5, Tafrigh uses 3 workers (limited by keys)
If you have 5 API keys and set concurrency: 2, Tafrigh uses 2 workers (respecting your limit)
If concurrency is omitted, Tafrigh uses all available API keys

Stage 5: Segment assembly

Successful transcriptions are sorted by timestamp and returned:

transcripts.sort((a: Segment, b: Segment) => a.start - b.start);

if (failures.length === 0 && callbacks?.onTranscriptionFinished) {
    await callbacks.onTranscriptionFinished(transcripts);
}

return { failures, transcripts };

Each segment contains:

{
  text: "Hello world",
  start: 0,              // Seconds in original audio
  end: 2.5,              // Seconds in original audio
  confidence: 0.95,      // Optional: Wit.ai confidence score
  tokens: [              // Optional: Word-level breakdown
    { text: "Hello", start: 0, end: 1.2, confidence: 0.98 },
    { text: "world", start: 1.3, end: 2.5, confidence: 0.92 }
  ]
}

Error handling

If any chunks fail after all retries, the pipeline throws a TranscriptionError:

if (failures.length > 0) {
    shouldCleanup = false;
    throw new TranscriptionError(
        `Failed to transcribe ${failures.length} chunk(s)`,
        {
            chunkFiles,
            failures,
            outputDir,
            transcripts,
        }
    );
}

When a TranscriptionError is thrown, the temporary directory is not cleaned up. This preserves failed chunks for retry with resumeFailedTranscriptions.

Cleanup

By default, temporary files are deleted after successful transcription:

finally {
    if (shouldCleanup && outputDir) {
        logger.info?.(`Cleaning up ${outputDir}`);
        await fs.rm(outputDir, { force: true, recursive: true });
    }
}

Set preventCleanup: true to preserve files for debugging:

const transcript = await transcribe('audio.mp3', {
  preventCleanup: true
});
// Temporary files remain in /tmp/tafrigh*

Complete example

Here’s a full pipeline with progress tracking:

import { init, transcribe } from 'tafrigh';

init({ apiKeys: ['key1', 'key2', 'key3'] });

const transcript = await transcribe('https://example.com/podcast.mp3', {
  concurrency: 3,
  retries: 5,
  
  preprocessOptions: {
    noiseReduction: {
      highpass: 300,
      lowpass: 3000,
      dialogueEnhance: true
    }
  },
  
  splitOptions: {
    chunkDuration: 60,
    silenceDetection: {
      silenceThreshold: -25,
      silenceDuration: 0.1
    }
  },
  
  callbacks: {
    onPreprocessingStarted: async (path) => 
      console.log(`Preprocessing: ${path}`),
    onSplittingStarted: async (total) => 
      console.log(`Splitting into ${total} chunks`),
    onTranscriptionStarted: async (total) => 
      console.log(`Transcribing ${total} chunks with 3 workers`),
    onTranscriptionProgress: (index) => 
      console.log(`Completed chunk ${index}`),
    onTranscriptionFinished: async (segments) => 
      console.log(`Generated ${segments.length} segments`)
  }
});

console.log(`Transcribed ${transcript.length} segments`);

Performance considerations

Optimal chunk duration

Shorter chunks (30-45s) enable more parallelism but create more API requests. Longer chunks (90-120s) reduce overhead but limit concurrency. The default 60s balances both.

API key scaling

Each worker needs a dedicated API key due to Wit.ai rate limits. If you have 10 chunks and 3 keys, Tafrigh processes 3 at a time. Adding more keys speeds up large files proportionally.

Retry strategy

The default 5 retries with exponential backoff (1s, 2s, 4s, 8s, 16s) handles transient network issues. For unstable connections, increase retries to 7-10.

Getting Started

Core Concepts

Guides

Examples

Transcription pipeline

Pipeline overview

Stage 1: Input validation

Stage 2: Audio preprocessing

Stage 3: Chunk creation

Stage 4: Concurrent transcription

Stage 5: Segment assembly

Error handling

Cleanup

Complete example

Performance considerations

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Guides

Examples

​Pipeline overview

​Stage 1: Input validation

​Stage 2: Audio preprocessing

​Stage 3: Chunk creation

​Stage 4: Concurrent transcription

​Stage 5: Segment assembly

​Error handling

​Cleanup

​Complete example

​Performance considerations

Build docs developers (and LLMs) love

Pipeline overview

Stage 1: Input validation

Stage 2: Audio preprocessing

Stage 3: Chunk creation

Stage 4: Concurrent transcription

Stage 5: Segment assembly

Error handling

Cleanup

Complete example

Performance considerations