Basic Transcription

This guide demonstrates basic audio file transcription using whisper.rn. You’ll learn how to initialize a context, transcribe audio files, and handle progress callbacks.

Quick Start

Initialize Whisper Context

First, initialize a Whisper context with a model file. You can use a bundled asset or download a model:

import { initWhisper } from 'whisper.rn';

const context = await initWhisper({
  filePath: require('../assets/ggml-base.bin'),
});

console.log('Loaded model, ID:', context.id);

For production apps, download models at runtime to keep your app bundle size small. The base model is ~140MB.

Transcribe Audio File

Transcribe an audio file using the transcribe() method:

const sampleFile = require('../assets/jfk.wav');

const { stop, promise } = context.transcribe(sampleFile, {
  language: 'en',
  maxLen: 1,
  tokenTimestamps: true,
  onProgress: (progress) => {
    console.log(`Transcribing: ${progress}%`);
  },
});

const { result, segments } = await promise;

console.log('Result:', result);
console.log('Segments:', segments);

The transcribe() method returns:

stop: Function to cancel transcription
promise: Promise that resolves with transcription results

Process Results

The transcription result includes the full text and segmented output with timestamps:

// Helper function to format timestamps
function toTimestamp(t: number) {
  let msec = t * 10;
  const hr = Math.floor(msec / (1000 * 60 * 60));
  msec -= hr * (1000 * 60 * 60);
  const min = Math.floor(msec / (1000 * 60));
  msec -= min * (1000 * 60);
  const sec = Math.floor(msec / 1000);
  msec -= sec * 1000;

  return `${String(hr).padStart(2, '0')}:${String(min).padStart(2, '0')}:${String(sec).padStart(2, '0')}.${String(msec).padStart(3, '0')}`;
}

// Display formatted segments
const formattedSegments = segments.map((segment) => 
  `[${toTimestamp(segment.t0)} --> ${toTimestamp(segment.t1)}] ${segment.text}`
).join('\n');

console.log('Formatted transcription:\n', formattedSegments);

Clean Up

Always release the context when you’re done to free up memory:

await context.release();
console.log('Context released');

Use React’s useEffect cleanup to automatically release contexts when components unmount.

Complete Example

Here’s a complete React Native component demonstrating basic transcription:

Complete Example

import React, { useCallback, useEffect, useRef, useState } from 'react';
import { View, Text, Button, ScrollView } from 'react-native';
import { initWhisper } from 'whisper.rn';
import type { WhisperContext } from 'whisper.rn';

const sampleFile = require('../assets/jfk.wav');

export default function BasicTranscription() {
  const contextRef = useRef<WhisperContext | null>(null);
  const [logs, setLogs] = useState<string[]>([]);
  const [result, setResult] = useState<string | null>(null);
  const [stopTranscribe, setStopTranscribe] = useState<{ stop: () => void } | null>(null);

  const log = useCallback((...messages: any[]) => {
    setLogs((prev) => [...prev, messages.join(' ')]);
  }, []);

  // Cleanup on unmount
  useEffect(() => {
    return () => {
      contextRef.current?.release();
    };
  }, []);

  const initialize = async () => {
    if (contextRef.current) {
      await contextRef.current.release();
      log('Released previous context');
    }

    log('Initializing context...');
    const startTime = Date.now();
    const ctx = await initWhisper({
      filePath: require('../assets/ggml-base.bin'),
    });
    const endTime = Date.now();
    
    log(`Loaded model in ${endTime - startTime}ms`);
    contextRef.current = ctx;
  };

  const transcribe = async () => {
    if (!contextRef.current) {
      log('Context not initialized');
      return;
    }

    log('Starting transcription...');
    const startTime = Date.now();
    
    const { stop, promise } = contextRef.current.transcribe(sampleFile, {
      language: 'en',
      maxLen: 1,
      tokenTimestamps: true,
      onProgress: (progress) => {
        log(`Progress: ${progress}%`);
      },
    });

    setStopTranscribe({ stop });
    const { result, segments } = await promise;
    const endTime = Date.now();
    
    setStopTranscribe(null);
    setResult(
      `Result: ${result}\n` +
      `Time: ${endTime - startTime}ms\n\n` +
      `Segments:\n${segments.map((s) => 
        `[${s.t0} --> ${s.t1}] ${s.text}`
      ).join('\n')}`
    );
    log('Transcription complete');
  };

  return (
    <ScrollView style={{ padding: 20 }}>
      <Button title="Initialize" onPress={initialize} />
      <Button 
        title="Transcribe" 
        onPress={transcribe}
        disabled={!contextRef.current || !!stopTranscribe}
      />
      {stopTranscribe && (
        <Button title="Stop" onPress={() => stopTranscribe.stop()} />
      )}
      
      <View style={{ marginTop: 20 }}>
        <Text>Logs:</Text>
        {logs.map((log, i) => (
          <Text key={i}>{log}</Text>
        ))}
      </View>

      {result && (
        <View style={{ marginTop: 20 }}>
          <Text>Result:</Text>
          <Text>{result}</Text>
        </View>
      )}
    </ScrollView>
  );
}

Transcription Options

The transcribe() method accepts various options to customize behavior:

const { promise } = context.transcribe(audioFile, {
  language: 'auto', // Auto-detect language
  // Or specify: 'en', 'es', 'fr', 'de', 'ja', etc.
});

Error Handling

Always wrap transcription calls in try-catch blocks:

try {
  const context = await initWhisper({
    filePath: require('../assets/ggml-base.bin'),
  });

  const { promise } = context.transcribe(audioFile, {
    language: 'en',
  });

  const { result } = await promise;
  console.log('Success:', result);
} catch (error) {
  console.error('Transcription failed:', error);
}

Performance Tips

Model Selection: Start with tiny or base models for testing. Use small or medium for production.

GPU Acceleration: GPU/Metal is enabled by default on iOS. This significantly improves performance.

Thread Count: The default thread count (2-4) works well for most devices. Adjust using the maxThreads option if needed.

Next Steps

VAD Detection

Learn how to detect speech segments in audio files

Realtime Streaming

Implement live transcription from microphone input

File Handling

Work with different audio formats and data sources

API Reference

Full API documentation for WhisperContext

Get Started

Core Concepts

Features

Platform Guides

Examples

Advanced

Resources

Quick Start

Complete Example

Transcription Options

Error Handling

Performance Tips

Next Steps

VAD Detection

Realtime Streaming

File Handling

API Reference

Build docs developers (and LLMs) love

Get Started

Core Concepts

Features

Platform Guides

Examples

Advanced

Resources

​Quick Start

​Complete Example

​Transcription Options

​Error Handling

​Performance Tips

​Next Steps

VAD Detection

Realtime Streaming

File Handling

API Reference

Build docs developers (and LLMs) love

Quick Start

Complete Example

Transcription Options

Error Handling

Performance Tips

Next Steps