Overview
Optimizing whisper.rn involves balancing accuracy, speed, and resource usage. This guide covers model selection, threading configuration, GPU acceleration, and platform-specific optimizations.Model Selection
The most impactful optimization is choosing the right model size.Model Comparison
| Model | Size | Speed | Accuracy | RAM Usage | Best For |
|---|---|---|---|---|---|
| tiny.en | 75 MB | Fastest | Good | ~100 MB | Live transcription, quick results |
| tiny | 75 MB | Fastest | Good | ~100 MB | Multi-language, real-time |
| base.en | 142 MB | Fast | Better | ~150 MB | Balanced performance |
| base | 142 MB | Fast | Better | ~150 MB | Multi-language, balanced |
| small.en | 466 MB | Medium | Great | ~500 MB | High accuracy needed |
| small | 466 MB | Medium | Great | ~500 MB | Multi-language, accurate |
| medium.en | 1.5 GB | Slow | Excellent | ~1.8 GB | Offline processing |
| large | 3 GB | Slowest | Best | ~3.5 GB | Maximum accuracy (not recommended mobile) |
Quantized Models
Quantized models reduce size and improve speed with minimal accuracy loss.- Mobile realtime:
tiny.enortiny.en-q8 - Mobile batch:
base.enorbase.en-q8 - High accuracy:
small.enorsmall.en-q8 - Avoid on mobile:
medium,large(too slow, memory intensive)
Threading Configuration
maxThreads
Controls CPU parallelization for transcription.NativeRNWhisper.ts
Number of CPU threads to use during computationDefault: Automatically detected
- 2 threads for 4-core devices
- 4 threads for 6+ core devices
Finding Optimal Thread Count
Use thebench() method to test different thread counts:
- 1 thread: Slowest, most battery efficient
- 2 threads: Good balance for most devices
- 4 threads: Fast on modern devices (6+ cores)
- 6+ threads: Diminishing returns, more heat/battery drain
Best Practices
GPU Acceleration
iOS Metal Acceleration
iOS devices support Metal GPU acceleration for significant speed improvements.Enable Metal GPU acceleration on iOSRequirements:
- iOS device (not simulator)
- Metal-compatible device (iPhone 5s+, iPad Air+)
- Model supports GPU acceleration
- 2-4x faster encoding on compatible devices
- Reduced CPU usage
- Lower battery consumption for sustained transcription
Disabling Metal (Build-time)
To reduce binary size or avoid Metal dependency:Podfile
iOS Core ML Acceleration
Core ML accelerates the encoder on iOS 15.0+.Enable Core ML encoder accelerationRequirements:
- iOS 15.0+
- Core ML model files (
.mlmodelcdirectory) - Model files co-located with GGML model
- Core ML: Encoder only, requires model files
- Metal: Full acceleration, no extra files needed
- Recommendation: Use Metal (
useGpu: true) for simplicity
Podfile
Flash Attention (Experimental)
Enable Flash Attention optimizationRecommended: Only when GPU is availableBenefits: Faster attention computation, lower memory usageTrade-offs: Slight accuracy impact, experimental
Transcription Options Tuning
Language Detection
Beam Search
Beam size for beam search decoding
- 1: Greedy decoding (fastest)
- 3-5: Moderate beam search (balanced)
- 10+: Large beam (slow, diminishing returns)
Temperature Sampling
Sampling temperature for decoding
- 0.0: Deterministic, fastest
- 0.0-0.5: Low randomness
- 0.5-1.0: Higher randomness (slower)
Context and Length
Realtime Transcription Optimization
Slice Configuration
Slice duration in seconds
- 20-25s: More responsive, more overhead
- 30s: Optimal (matches Whisper chunk size)
- 40+s: Slower updates, less overhead
Interval between realtime processing updates
- 100ms: Very responsive, higher CPU
- 200ms: Balanced (default)
- 500ms: Lower CPU, less responsive
VAD Configuration
VAD (Voice Activity Detection) reduces unnecessary transcription.types.ts
Platform-Specific Optimizations
iOS
1. Use Metal AccelerationPodfile
Info.plist
Android
1. NDK Configurationandroid/build.gradle
proguard-rules.pro
- Automatically handled by whisper.cpp build system
Testing Performance
Release Mode
Always test performance in Release mode (Debug is 5-10x slower).Benchmarking
Real-world Testing
Performance Checklist
Model Selection
Model Selection
- Use smallest model that meets accuracy needs
- Consider quantized models (q8, q5)
- Avoid medium/large on mobile
- Use
.enmodels for English-only
Configuration
Configuration
- Set
language(avoid ‘auto’) - Use
beamSize: 1for speed - Tune
maxThreadswith bench() - Enable GPU on iOS (
useGpu: true) - Limit
maxSlicesInMemoryfor realtime
Testing
Testing
- Always test in Release mode
- Benchmark on target devices
- Monitor memory usage
- Test with realistic audio (length, quality)
Production
Production
- Release contexts after use
- Handle out-of-memory errors
- Provide fallback for low-end devices
- Monitor crash analytics
Troubleshooting Performance
Slow Transcription
Slow Transcription
Symptoms: Transcription takes longer than audio durationSolutions:
- Test in Release mode (not Debug)
- Use smaller model (base/tiny instead of small/medium)
- Increase
maxThreads(test with bench()) - Enable GPU on iOS (
useGpu: true) - Set language explicitly (skip auto-detection)
- Use
beamSize: 1(greedy decoding)
High Memory Usage
High Memory Usage
Solutions:
- Use smaller model
- Reduce
maxSlicesInMemory(1-2 for realtime) - Release contexts:
await context.release() - Disable
promptPreviousSlices - Enable Extended Virtual Addressing (iOS large models)
Battery Drain
Battery Drain
Solutions:
- Reduce
maxThreads(try 1-2) - Use smaller model
- Increase
realtimeProcessingPauseMs - Enable GPU (iOS) - more efficient than CPU
- Use VAD to skip silence
GPU Not Active (iOS)
GPU Not Active (iOS)
Check:Reasons:
- Running on simulator (Metal not available)
RNWHISPER_DISABLE_METAL=1build flag set- Device doesn’t support Metal
Related
- Memory Management - SliceManager and context cleanup
- Realtime Transcription - RealtimeTranscriber API
- VAD - Voice activity detection
- whisper.cpp Performance - Upstream benchmarks