Skip to main content

Overview

Realtime transcription requires careful memory management to prevent out-of-memory errors during long recording sessions. The SliceManager class implements a circular buffer strategy that automatically manages audio slices, keeping only recent data in memory.

Memory Architecture

Audio Slices

Audio is divided into fixed-duration slices (default: 30 seconds) that are:
  • Processed independently for transcription
  • Stored temporarily in memory
  • Released when no longer needed
  • Limited by maxSlicesInMemory configuration
Recording Timeline:
[Slice 0: 0-30s] → [Slice 1: 30-60s] → [Slice 2: 60-90s] → ...

maxSlicesInMemory = 3:
Memory: [Slice 0] [Slice 1] [Slice 2]
        ↓ New slice created (Slice 3)
Memory: [Slice 1] [Slice 2] [Slice 3]  ← Slice 0 released

Memory Lifecycle

  1. Creation: New slice allocated when audio data arrives
  2. Population: Audio chunks added until slice duration reached
  3. Processing: Slice sent to Whisper for transcription
  4. Retention: Kept in memory for context/prompt chaining
  5. Release: Removed when exceeding maxSlicesInMemory

SliceManager API

The SliceManager class handles all slice lifecycle operations.

Constructor

const sliceManager = new SliceManager(
  sliceDurationSec,     // default: 30
  maxSlicesInMemory,    // default: 1
  sampleRate           // default: 16000
)
sliceDurationSec
number
default:"30"
Duration of each audio slice in seconds. Matches Whisper’s 30-second processing chunks.
maxSlicesInMemory
number
default:"1"
Maximum number of slices to keep in memory. Older slices are released automatically.
sampleRate
number
default:"16000"
Audio sample rate in Hz (Whisper requires 16kHz).

Adding Audio Data

SliceManager.ts
addAudioData(audioData: Uint8Array): { slice?: AudioSlice }
Appends audio data to the current slice. Automatically creates new slices when the current one is full.
audioData
Uint8Array
required
Raw PCM audio data (16-bit, mono, 16kHz)
slice
AudioSlice
The current slice being populated (may be incomplete)
Behavior:
  • Accumulates data in the current slice
  • Creates new slice when duration reached (80% capacity threshold)
  • Triggers cleanup when maxSlicesInMemory exceeded
  • Returns the current slice object
Example:
const { slice } = sliceManager.addAudioData(audioChunk)
console.log(`Slice ${slice?.index}: ${slice?.sampleCount} bytes`)

Getting Slices for Transcription

getSliceForTranscription(): AudioSlice | null
Retrieves the next unprocessed slice for transcription.
slice
AudioSlice | null
Next slice to transcribe, or null if none available
Example:
const slice = sliceManager.getSliceForTranscription()
if (slice) {
  const result = await whisperContext.transcribeData(slice.data.buffer)
  sliceManager.markSliceAsProcessed(slice.index)
}

Marking Slices as Processed

markSliceAsProcessed(sliceIndex: number): void
Marks a slice as transcribed, preventing duplicate processing.
sliceIndex
number
required
Index of the slice to mark

Moving to Next Slice

moveToNextTranscribeSlice(): void
Advances the internal pointer to the next slice for transcription. Used in sequential processing.

Getting Slice Data

getAudioDataForTranscription(sliceIndex: number): Uint8Array | null
Retrieves audio data for a specific slice.
sliceIndex
number
required
Index of the slice to retrieve
data
Uint8Array | null
Raw PCM audio data, or null if slice not found or empty

Getting Slice by Index

getSliceByIndex(sliceIndex: number): AudioSlice | null
Retrieves complete slice metadata.
slice
AudioSlice | null
Slice object with full metadata

Memory Usage Statistics

SliceManager.ts
getMemoryUsage(): MemoryUsage
Returns current memory usage metrics.
MemoryUsage
object
Example:
const usage = sliceManager.getMemoryUsage()
console.log(`Memory: ${usage.estimatedMB} MB (${usage.slicesInMemory} slices)`)

Forcing New Slice

forceNextSlice(): { slice?: AudioSlice }
Finalizes the current slice immediately, regardless of capacity. Used for VAD-based slicing on speech_end events.
slice
AudioSlice
The finalized slice
Use Case:
// VAD detected end of speech
vadContext.onSpeechEnd(() => {
  const { slice } = sliceManager.forceNextSlice()
  // Transcribe this speech segment immediately
})

Current Slice Info

getCurrentSliceInfo(): object
Returns comprehensive slice tracking information.
info
object

Reset

reset(): void
Releases all slices and resets internal state. Call when stopping transcription. Example:
await transcriber.stop()
sliceManager.reset() // Clean up memory

AudioSlice Type

Each slice contains metadata and audio data:
types.ts
export interface AudioSlice {
  index: number           // Slice sequence number (0, 1, 2, ...)
  data: Uint8Array       // Raw PCM audio data
  sampleCount: number    // Number of bytes in data
  startTime: number      // Creation timestamp (ms)
  endTime: number        // Last update timestamp (ms)
  isProcessed: boolean   // Has been transcribed
  isReleased: boolean    // Memory has been freed
}

Memory Usage Patterns

Minimal Memory (Live Transcription)

const transcriber = new RealtimeTranscriber(
  { whisperContext, audioStream },
  {
    audioSliceSec: 30,
    maxSlicesInMemory: 1,  // Keep only current slice
    promptPreviousSlices: false,  // Don't chain prompts
  }
)
Memory: ~1 MB per slice (30s @ 16kHz mono)

Balanced (With Context)

const transcriber = new RealtimeTranscriber(
  { whisperContext, audioStream },
  {
    audioSliceSec: 30,
    maxSlicesInMemory: 3,  // Keep last 3 slices
    promptPreviousSlices: true,  // Chain for context
  }
)
Memory: ~3 MB (allows prompt chaining for better continuity)

Maximum Context (Long Sessions)

const transcriber = new RealtimeTranscriber(
  { whisperContext, audioStream },
  {
    audioSliceSec: 25,      // Slightly shorter slices
    maxSlicesInMemory: 5,   // Keep more history
    promptPreviousSlices: true,
  }
)
Memory: ~5 MB (better context but higher memory usage)

Context Release Best Practices

Releasing Whisper Context

Whisper contexts hold model data in memory (100-400 MB depending on model size).
// Always release contexts when done
try {
  const whisperContext = await initWhisper({ filePath: modelPath })
  const vadContext = await initWhisperVad({ filePath: vadModelPath })
  
  // Use contexts...
  const transcriber = new RealtimeTranscriber({ whisperContext, vadContext, audioStream })
  await transcriber.start()
  
} finally {
  // Clean up
  await transcriber.stop()
  await whisperContext.release()
  await vadContext?.release()
}

Releasing All Contexts

import { releaseAllWhisper, releaseAllWhisperVad } from 'whisper.rn'

// Release all Whisper contexts
await releaseAllWhisper()

// Release all VAD contexts
await releaseAllWhisperVad()

Component Cleanup

import { useEffect } from 'react'
import { initWhisper } from 'whisper.rn'

function TranscriptionComponent() {
  const [context, setContext] = useState(null)

  useEffect(() => {
    let whisperContext
    
    initWhisper({ filePath: modelPath }).then(ctx => {
      whisperContext = ctx
      setContext(ctx)
    })

    return () => {
      // Clean up on unmount
      whisperContext?.release()
    }
  }, [])

  return <View>...</View>
}

Memory Monitoring

Tracking Memory in RealtimeTranscriber

const transcriber = new RealtimeTranscriber(
  { whisperContext, audioStream },
  { maxSlicesInMemory: 3 },
  {
    onTranscribe: (event) => {
      const { memoryUsage } = event
      console.log(`Memory: ${memoryUsage?.estimatedMB} MB`)
      console.log(`Slices: ${memoryUsage?.slicesInMemory}`)
      console.log(`Samples: ${memoryUsage?.totalSamples}`)
    },
  }
)

Manual Monitoring

const sliceManager = new SliceManager(30, 3)

setInterval(() => {
  const info = sliceManager.getCurrentSliceInfo()
  const usage = info.memoryUsage
  
  console.log(`Slices: ${usage.slicesInMemory}/${info.totalSlices}`)
  console.log(`Current: ${info.currentSliceIndex}, Next: ${info.transcribeSliceIndex}`)
  console.log(`Memory: ${usage.estimatedMB} MB`)
}, 5000)

Troubleshooting

Symptoms: App crashes during long recording sessionsSolutions:
  • Reduce maxSlicesInMemory (try 1-2 for minimal usage)
  • Use smaller model (tiny, base instead of medium/large)
  • Disable promptPreviousSlices to avoid keeping slice results
  • Reduce audioSliceSec (use 20-25 seconds instead of 30)
  • Call release() on contexts when done
Symptoms: Memory usage increases without boundLikely Causes:
  • Not calling release() on finished contexts
  • Storing transcription results without limit
  • Audio stream not stopping properly
Solutions:
  • Ensure maxSlicesInMemory is set (SliceManager auto-cleans)
  • Release contexts: await context.release()
  • Clear old transcription results periodically
  • Verify audioStream.stop() is called
Symptoms: Memory usage stays constant even with circular bufferDebug:
const info = sliceManager.getCurrentSliceInfo()
console.log('Slices:', info.totalSlices, 'Max:', maxSlicesInMemory)
console.log('Current:', info.currentSliceIndex, 'Transcribe:', info.transcribeSliceIndex)
Solutions:
  • Verify maxSlicesInMemory is configured
  • Check if slices are marked as processed
  • Ensure cleanup logic is running
Symptoms: Large models crash on iOSSolution: Enable Extended Virtual Addressing entitlementAdd to Info.plist:
<key>UIRequiresFullScreen</key>
<false/>
<key>com.apple.developer.kernel.extended-virtual-addressing</key>
<true/>
This allows apps to use more memory for large models.

Memory Calculation Reference

Audio Data Size

16-bit PCM, Mono, 16kHz:
- 1 second = 16,000 samples × 2 bytes = 32,000 bytes = 31.25 KB
- 30 seconds = 960,000 bytes ≈ 937.5 KB ≈ 0.92 MB

maxSlicesInMemory = 3:
- Audio data: 3 × 0.92 MB ≈ 2.76 MB

Model Size (Approximate)

ModelSizeRAM Usage
tiny.en75 MB~100 MB
base.en142 MB~150 MB
small.en466 MB~500 MB
medium.en1.5 GB~1.8 GB

Total Memory Estimate

Total ≈ Model RAM + (Slices × Slice Size) + Overhead

Example (base.en, 3 slices):
≈ 150 MB + (3 × 1 MB) + 50 MB = ~200 MB

Build docs developers (and LLMs) love