Overview
Realtime transcription requires careful memory management to prevent out-of-memory errors during long recording sessions. TheSliceManager class implements a circular buffer strategy that automatically manages audio slices, keeping only recent data in memory.
Memory Architecture
Audio Slices
Audio is divided into fixed-duration slices (default: 30 seconds) that are:- Processed independently for transcription
- Stored temporarily in memory
- Released when no longer needed
- Limited by
maxSlicesInMemoryconfiguration
Memory Lifecycle
- Creation: New slice allocated when audio data arrives
- Population: Audio chunks added until slice duration reached
- Processing: Slice sent to Whisper for transcription
- Retention: Kept in memory for context/prompt chaining
- Release: Removed when exceeding
maxSlicesInMemory
SliceManager API
TheSliceManager class handles all slice lifecycle operations.
Constructor
Duration of each audio slice in seconds. Matches Whisper’s 30-second processing chunks.
Maximum number of slices to keep in memory. Older slices are released automatically.
Audio sample rate in Hz (Whisper requires 16kHz).
Adding Audio Data
SliceManager.ts
Raw PCM audio data (16-bit, mono, 16kHz)
The current slice being populated (may be incomplete)
- Accumulates data in the current slice
- Creates new slice when duration reached (80% capacity threshold)
- Triggers cleanup when
maxSlicesInMemoryexceeded - Returns the current slice object
Getting Slices for Transcription
Next slice to transcribe, or
null if none availableMarking Slices as Processed
Index of the slice to mark
Moving to Next Slice
Getting Slice Data
Index of the slice to retrieve
Raw PCM audio data, or
null if slice not found or emptyGetting Slice by Index
Slice object with full metadata
Memory Usage Statistics
SliceManager.ts
Forcing New Slice
The finalized slice
Current Slice Info
Reset
AudioSlice Type
Each slice contains metadata and audio data:types.ts
Memory Usage Patterns
Minimal Memory (Live Transcription)
Balanced (With Context)
Maximum Context (Long Sessions)
Context Release Best Practices
Releasing Whisper Context
Whisper contexts hold model data in memory (100-400 MB depending on model size).Releasing All Contexts
Component Cleanup
Memory Monitoring
Tracking Memory in RealtimeTranscriber
Manual Monitoring
Troubleshooting
Out of Memory Errors
Out of Memory Errors
Symptoms: App crashes during long recording sessionsSolutions:
- Reduce
maxSlicesInMemory(try 1-2 for minimal usage) - Use smaller model (tiny, base instead of medium/large)
- Disable
promptPreviousSlicesto avoid keeping slice results - Reduce
audioSliceSec(use 20-25 seconds instead of 30) - Call
release()on contexts when done
Memory Growing Indefinitely
Memory Growing Indefinitely
Symptoms: Memory usage increases without boundLikely Causes:
- Not calling
release()on finished contexts - Storing transcription results without limit
- Audio stream not stopping properly
- Ensure
maxSlicesInMemoryis set (SliceManager auto-cleans) - Release contexts:
await context.release() - Clear old transcription results periodically
- Verify
audioStream.stop()is called
Slices Not Being Released
Slices Not Being Released
Symptoms: Memory usage stays constant even with circular bufferDebug:Solutions:
- Verify
maxSlicesInMemoryis configured - Check if slices are marked as processed
- Ensure cleanup logic is running
iOS Extended Virtual Addressing
iOS Extended Virtual Addressing
Symptoms: Large models crash on iOSSolution: Enable Extended Virtual Addressing entitlementAdd to This allows apps to use more memory for large models.
Info.plist:Memory Calculation Reference
Audio Data Size
Model Size (Approximate)
| Model | Size | RAM Usage |
|---|---|---|
| tiny.en | 75 MB | ~100 MB |
| base.en | 142 MB | ~150 MB |
| small.en | 466 MB | ~500 MB |
| medium.en | 1.5 GB | ~1.8 GB |
Total Memory Estimate
Related
- Realtime Transcription - Using SliceManager with RealtimeTranscriber
- Custom Audio Adapters - Audio stream integration
- Optimization - Performance tuning and threading