Memory Management - whisper.rn

Overview

Realtime transcription requires careful memory management to prevent out-of-memory errors during long recording sessions. The SliceManager class implements a circular buffer strategy that automatically manages audio slices, keeping only recent data in memory.

Memory Architecture

Audio Slices

Audio is divided into fixed-duration slices (default: 30 seconds) that are:

Processed independently for transcription
Stored temporarily in memory
Released when no longer needed
Limited by maxSlicesInMemory configuration

Recording Timeline:
[Slice 0: 0-30s] → [Slice 1: 30-60s] → [Slice 2: 60-90s] → ...

maxSlicesInMemory = 3:
Memory: [Slice 0] [Slice 1] [Slice 2]
        ↓ New slice created (Slice 3)
Memory: [Slice 1] [Slice 2] [Slice 3]  ← Slice 0 released

Memory Lifecycle

Creation: New slice allocated when audio data arrives
Population: Audio chunks added until slice duration reached
Processing: Slice sent to Whisper for transcription
Retention: Kept in memory for context/prompt chaining
Release: Removed when exceeding maxSlicesInMemory

SliceManager API

The SliceManager class handles all slice lifecycle operations.

Constructor

const sliceManager = new SliceManager(
  sliceDurationSec,     // default: 30
  maxSlicesInMemory,    // default: 1
  sampleRate           // default: 16000
)

sliceDurationSec

number

default:"30"

Duration of each audio slice in seconds. Matches Whisper’s 30-second processing chunks.

maxSlicesInMemory

number

default:"1"

Maximum number of slices to keep in memory. Older slices are released automatically.

sampleRate

number

default:"16000"

Audio sample rate in Hz (Whisper requires 16kHz).

Adding Audio Data

SliceManager.ts

addAudioData(audioData: Uint8Array): { slice?: AudioSlice }

Appends audio data to the current slice. Automatically creates new slices when the current one is full.

audioData

Uint8Array

required

Raw PCM audio data (16-bit, mono, 16kHz)

slice

AudioSlice

The current slice being populated (may be incomplete)

Behavior:

Accumulates data in the current slice
Creates new slice when duration reached (80% capacity threshold)
Triggers cleanup when maxSlicesInMemory exceeded
Returns the current slice object

Example:

const { slice } = sliceManager.addAudioData(audioChunk)
console.log(`Slice ${slice?.index}: ${slice?.sampleCount} bytes`)

Getting Slices for Transcription

getSliceForTranscription(): AudioSlice | null

Retrieves the next unprocessed slice for transcription.

slice

AudioSlice | null

Next slice to transcribe, or null if none available

Example:

const slice = sliceManager.getSliceForTranscription()
if (slice) {
  const result = await whisperContext.transcribeData(slice.data.buffer)
  sliceManager.markSliceAsProcessed(slice.index)
}

Marking Slices as Processed

markSliceAsProcessed(sliceIndex: number): void

Marks a slice as transcribed, preventing duplicate processing.

sliceIndex

number

required

Index of the slice to mark

Moving to Next Slice

moveToNextTranscribeSlice(): void

Advances the internal pointer to the next slice for transcription. Used in sequential processing.

Getting Slice Data

getAudioDataForTranscription(sliceIndex: number): Uint8Array | null

Retrieves audio data for a specific slice.

sliceIndex

number

required

Index of the slice to retrieve

data

Uint8Array | null

Raw PCM audio data, or null if slice not found or empty

Getting Slice by Index

getSliceByIndex(sliceIndex: number): AudioSlice | null

Retrieves complete slice metadata.

slice

AudioSlice | null

Slice object with full metadata

Memory Usage Statistics

SliceManager.ts

getMemoryUsage(): MemoryUsage

Returns current memory usage metrics.

MemoryUsage

object

Show properties

slicesInMemory

number

Number of active slices currently in memory

totalSamples

number

Total audio samples stored (bytes / 2)

estimatedMB

number

Estimated memory usage in megabytes (rounded to 2 decimals)

Example:

const usage = sliceManager.getMemoryUsage()
console.log(`Memory: ${usage.estimatedMB} MB (${usage.slicesInMemory} slices)`)

Forcing New Slice

forceNextSlice(): { slice?: AudioSlice }

Finalizes the current slice immediately, regardless of capacity. Used for VAD-based slicing on speech_end events.

slice

AudioSlice

The finalized slice

Use Case:

// VAD detected end of speech
vadContext.onSpeechEnd(() => {
  const { slice } = sliceManager.forceNextSlice()
  // Transcribe this speech segment immediately
})

Current Slice Info

getCurrentSliceInfo(): object

Returns comprehensive slice tracking information.

info

object

Show properties

currentSliceIndex

number

Index of slice currently being populated

transcribeSliceIndex

number

Index of next slice to transcribe

totalSlices

number

Total number of slices in memory

memoryUsage

MemoryUsage

Current memory usage statistics

Reset

reset(): void

Releases all slices and resets internal state. Call when stopping transcription. Example:

await transcriber.stop()
sliceManager.reset() // Clean up memory

AudioSlice Type

Each slice contains metadata and audio data:

types.ts

export interface AudioSlice {
  index: number           // Slice sequence number (0, 1, 2, ...)
  data: Uint8Array       // Raw PCM audio data
  sampleCount: number    // Number of bytes in data
  startTime: number      // Creation timestamp (ms)
  endTime: number        // Last update timestamp (ms)
  isProcessed: boolean   // Has been transcribed
  isReleased: boolean    // Memory has been freed
}

Memory Usage Patterns

Minimal Memory (Live Transcription)

const transcriber = new RealtimeTranscriber(
  { whisperContext, audioStream },
  {
    audioSliceSec: 30,
    maxSlicesInMemory: 1,  // Keep only current slice
    promptPreviousSlices: false,  // Don't chain prompts
  }
)

Memory: ~1 MB per slice (30s @ 16kHz mono)

Balanced (With Context)

const transcriber = new RealtimeTranscriber(
  { whisperContext, audioStream },
  {
    audioSliceSec: 30,
    maxSlicesInMemory: 3,  // Keep last 3 slices
    promptPreviousSlices: true,  // Chain for context
  }
)

Memory: ~3 MB (allows prompt chaining for better continuity)

Maximum Context (Long Sessions)

const transcriber = new RealtimeTranscriber(
  { whisperContext, audioStream },
  {
    audioSliceSec: 25,      // Slightly shorter slices
    maxSlicesInMemory: 5,   // Keep more history
    promptPreviousSlices: true,
  }
)

Memory: ~5 MB (better context but higher memory usage)

Context Release Best Practices

Releasing Whisper Context

Whisper contexts hold model data in memory (100-400 MB depending on model size).

// Always release contexts when done
try {
  const whisperContext = await initWhisper({ filePath: modelPath })
  const vadContext = await initWhisperVad({ filePath: vadModelPath })
  
  // Use contexts...
  const transcriber = new RealtimeTranscriber({ whisperContext, vadContext, audioStream })
  await transcriber.start()
  
} finally {
  // Clean up
  await transcriber.stop()
  await whisperContext.release()
  await vadContext?.release()
}

Releasing All Contexts

import { releaseAllWhisper, releaseAllWhisperVad } from 'whisper.rn'

// Release all Whisper contexts
await releaseAllWhisper()

// Release all VAD contexts
await releaseAllWhisperVad()

Component Cleanup

import { useEffect } from 'react'
import { initWhisper } from 'whisper.rn'

function TranscriptionComponent() {
  const [context, setContext] = useState(null)

  useEffect(() => {
    let whisperContext
    
    initWhisper({ filePath: modelPath }).then(ctx => {
      whisperContext = ctx
      setContext(ctx)
    })

    return () => {
      // Clean up on unmount
      whisperContext?.release()
    }
  }, [])

  return <View>...</View>
}

Memory Monitoring

Tracking Memory in RealtimeTranscriber

const transcriber = new RealtimeTranscriber(
  { whisperContext, audioStream },
  { maxSlicesInMemory: 3 },
  {
    onTranscribe: (event) => {
      const { memoryUsage } = event
      console.log(`Memory: ${memoryUsage?.estimatedMB} MB`)
      console.log(`Slices: ${memoryUsage?.slicesInMemory}`)
      console.log(`Samples: ${memoryUsage?.totalSamples}`)
    },
  }
)

Manual Monitoring

const sliceManager = new SliceManager(30, 3)

setInterval(() => {
  const info = sliceManager.getCurrentSliceInfo()
  const usage = info.memoryUsage
  
  console.log(`Slices: ${usage.slicesInMemory}/${info.totalSlices}`)
  console.log(`Current: ${info.currentSliceIndex}, Next: ${info.transcribeSliceIndex}`)
  console.log(`Memory: ${usage.estimatedMB} MB`)
}, 5000)

Troubleshooting

Out of Memory Errors

Symptoms: App crashes during long recording sessionsSolutions:

Reduce maxSlicesInMemory (try 1-2 for minimal usage)
Use smaller model (tiny, base instead of medium/large)
Disable promptPreviousSlices to avoid keeping slice results
Reduce audioSliceSec (use 20-25 seconds instead of 30)
Call release() on contexts when done

Memory Growing Indefinitely

Symptoms: Memory usage increases without boundLikely Causes:

Not calling release() on finished contexts
Storing transcription results without limit
Audio stream not stopping properly

Solutions:

Ensure maxSlicesInMemory is set (SliceManager auto-cleans)
Release contexts: await context.release()
Clear old transcription results periodically
Verify audioStream.stop() is called

Slices Not Being Released

Symptoms: Memory usage stays constant even with circular bufferDebug:

const info = sliceManager.getCurrentSliceInfo()
console.log('Slices:', info.totalSlices, 'Max:', maxSlicesInMemory)
console.log('Current:', info.currentSliceIndex, 'Transcribe:', info.transcribeSliceIndex)

Solutions:

Verify maxSlicesInMemory is configured
Check if slices are marked as processed
Ensure cleanup logic is running

iOS Extended Virtual Addressing

Symptoms: Large models crash on iOSSolution: Enable Extended Virtual Addressing entitlementAdd to Info.plist:

<key>UIRequiresFullScreen</key>
<false/>
<key>com.apple.developer.kernel.extended-virtual-addressing</key>
<true/>

This allows apps to use more memory for large models.

Memory Calculation Reference

Audio Data Size

16-bit PCM, Mono, 16kHz:
- 1 second = 16,000 samples × 2 bytes = 32,000 bytes = 31.25 KB
- 30 seconds = 960,000 bytes ≈ 937.5 KB ≈ 0.92 MB

maxSlicesInMemory = 3:
- Audio data: 3 × 0.92 MB ≈ 2.76 MB

Model Size (Approximate)

Model	Size	RAM Usage
tiny.en	75 MB	~100 MB
base.en	142 MB	~150 MB
small.en	466 MB	~500 MB
medium.en	1.5 GB	~1.8 GB

Total Memory Estimate

Total ≈ Model RAM + (Slices × Slice Size) + Overhead

Example (base.en, 3 slices):
≈ 150 MB + (3 × 1 MB) + 50 MB = ~200 MB

Realtime Transcription - Using SliceManager with RealtimeTranscriber
Custom Audio Adapters - Audio stream integration
Optimization - Performance tuning and threading

Get Started

Core Concepts

Features

Platform Guides

Examples

Advanced

Resources

​Overview

​Memory Architecture

​Audio Slices

​Memory Lifecycle

​SliceManager API

​Constructor

​Adding Audio Data

​Getting Slices for Transcription

​Marking Slices as Processed

​Moving to Next Slice

​Getting Slice Data

​Getting Slice by Index

​Memory Usage Statistics

​Forcing New Slice

​Current Slice Info

​Reset

​AudioSlice Type

​Memory Usage Patterns

​Minimal Memory (Live Transcription)

​Balanced (With Context)

​Maximum Context (Long Sessions)

​Context Release Best Practices

​Releasing Whisper Context

​Releasing All Contexts

​Component Cleanup

​Memory Monitoring

​Tracking Memory in RealtimeTranscriber

​Manual Monitoring

​Troubleshooting

​Memory Calculation Reference

​Audio Data Size

​Model Size (Approximate)

​Total Memory Estimate

​Related

Build docs developers (and LLMs) love

Overview

Memory Architecture

Audio Slices

Memory Lifecycle

SliceManager API

Constructor

Adding Audio Data

Getting Slices for Transcription

Marking Slices as Processed

Moving to Next Slice

Getting Slice Data

Getting Slice by Index

Memory Usage Statistics

Forcing New Slice

Current Slice Info

Reset

AudioSlice Type

Memory Usage Patterns

Minimal Memory (Live Transcription)

Balanced (With Context)

Maximum Context (Long Sessions)

Context Release Best Practices

Releasing Whisper Context

Releasing All Contexts

Component Cleanup

Memory Monitoring

Tracking Memory in RealtimeTranscriber

Manual Monitoring

Troubleshooting

Memory Calculation Reference

Audio Data Size

Model Size (Approximate)

Total Memory Estimate

Related