Streaming audio

Working with streams

Tafrigh accepts Node.js Readable streams as input, making it easy to transcribe audio from sources like YouTube videos without downloading them first.

Transcribing YouTube videos

Use ytdl-core to stream audio directly from YouTube and pass it to Tafrigh for transcription:

import ytdl from 'ytdl-core';
import { init, transcribe } from 'tafrigh';

init({ apiKeys: ['your-wit-ai-key'] });

const videoUrl = 'https://www.youtube.com/watch?v=dQw4w9WgXcQ';
const audioStream = ytdl(videoUrl, {
  quality: 'highestaudio',
  filter: 'audioonly',
});

const transcript = await transcribe(audioStream);
console.log(transcript);

The stream is automatically processed, chunked, and transcribed without creating temporary files in your project directory. All processing happens in the OS temporary folder.

Streaming from HTTP sources

You can stream audio from any HTTP endpoint using Node.js built-in modules:

import https from 'https';
import { init, transcribe } from 'tafrigh';

init({ apiKeys: ['your-wit-ai-key'] });

const audioUrl = 'https://example.com/podcast-episode.mp3';

https.get(audioUrl, async (response) => {
  const transcript = await transcribe(response);
  console.log(transcript);
});

Streaming with custom options

Combine streaming with custom transcription options for better control:

import ytdl from 'ytdl-core';
import { init, transcribe } from 'tafrigh';

init({ apiKeys: ['key1', 'key2', 'key3'] });

const videoUrl = 'https://www.youtube.com/watch?v=VIDEO_ID';
const audioStream = ytdl(videoUrl, {
  quality: 'highestaudio',
  filter: 'audioonly',
});

const transcript = await transcribe(audioStream, {
  concurrency: 3,
  splitOptions: {
    chunkDuration: 60,
    silenceDetection: {
      silenceThreshold: -30,
      silenceDuration: 0.5,
    },
  },
  preprocessOptions: {
    noiseReduction: {
      dialogueEnhance: true,
      highpass: 200,
      lowpass: 3000,
    },
  },
});

console.log(transcript);

Using multiple API keys with higher concurrency significantly speeds up transcription of longer videos.

Progress tracking with streams

Monitor the transcription progress when working with streams:

import ytdl from 'ytdl-core';
import { init, transcribe } from 'tafrigh';

init({ apiKeys: ['your-wit-ai-key'] });

const videoUrl = 'https://www.youtube.com/watch?v=VIDEO_ID';
const audioStream = ytdl(videoUrl, {
  quality: 'highestaudio',
  filter: 'audioonly',
});

const transcript = await transcribe(audioStream, {
  callbacks: {
    onPreprocessingStarted: async (filePath) => {
      console.log('Starting preprocessing...');
    },
    onPreprocessingProgress: async (percent) => {
      console.log(`Preprocessing: ${percent}%`);
    },
    onSplittingStarted: async (totalChunks) => {
      console.log(`Splitting into ${totalChunks} chunks...`);
    },
    onTranscriptionStarted: async (totalChunks) => {
      console.log(`Transcribing ${totalChunks} chunks...`);
    },
    onTranscriptionProgress: async (chunkIndex) => {
      console.log(`Transcribed chunk ${chunkIndex}`);
    },
    onTranscriptionFinished: async (transcripts) => {
      console.log(`Complete! ${transcripts.length} segments transcribed`);
    },
  },
});

console.log(transcript);

Expected output

Starting preprocessing...
Preprocessing: 25%
Preprocessing: 50%
Preprocessing: 75%
Preprocessing: 100%
Splitting into 12 chunks...
Transcribing 12 chunks...
Transcribed chunk 0
Transcribed chunk 1
Transcribed chunk 2
...
Complete! 12 segments transcribed

Streaming from file system

You can also create streams from local files when you need more control over the read process:

import { createReadStream } from 'fs';
import { init, transcribe } from 'tafrigh';

init({ apiKeys: ['your-wit-ai-key'] });

const fileStream = createReadStream('large-audio-file.mp3', {
  highWaterMark: 64 * 1024, // 64KB chunks
});

const transcript = await transcribe(fileStream);
console.log(transcript);

When passing a file path directly to transcribe(), it’s more efficient than creating a stream manually. Use streams when you need to pipe data from remote sources or when you need fine-grained control over the reading process.

Error handling with streams

Handle stream errors gracefully to prevent crashes:

import ytdl from 'ytdl-core';
import { init, transcribe, TranscriptionError } from 'tafrigh';

init({ apiKeys: ['your-wit-ai-key'] });

const videoUrl = 'https://www.youtube.com/watch?v=VIDEO_ID';

try {
  const audioStream = ytdl(videoUrl, {
    quality: 'highestaudio',
    filter: 'audioonly',
  });

  audioStream.on('error', (error) => {
    console.error('Stream error:', error);
  });

  const transcript = await transcribe(audioStream);
  console.log('Transcription successful:', transcript);
} catch (error) {
  if (error instanceof TranscriptionError) {
    console.error('Transcription failed:', error.message);
    console.log('Partial results:', error.transcripts);
  } else {
    console.error('Unexpected error:', error);
  }
}

Always add error handlers to your streams to catch network issues, invalid URLs, or access restrictions before they reach the transcription layer.

Getting Started

Core Concepts

Guides

Examples

Streaming audio

Working with streams

Transcribing YouTube videos

Streaming from HTTP sources

Streaming with custom options

Progress tracking with streams

Expected output

Streaming from file system

Error handling with streams

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Guides

Examples

​Working with streams

​Transcribing YouTube videos

​Streaming from HTTP sources

​Streaming with custom options

​Progress tracking with streams

​Expected output

​Streaming from file system

​Error handling with streams

Build docs developers (and LLMs) love

Working with streams

Transcribing YouTube videos

Streaming from HTTP sources

Streaming with custom options

Progress tracking with streams

Expected output

Streaming from file system

Error handling with streams