Overview
AI models, especially Large Language Models (LLMs), can consume significant amounts of RAM. Understanding memory usage patterns and implementing proper management techniques ensures stable application performance.Memory Requirements
Large Language Models
Based on real-world measurements from React Native ExecuTorch:iPhone 17 Pro (iOS)
| Model | Memory Usage (GB) |
|---|---|
| LLAMA3_2_1B | 3.1 |
| LLAMA3_2_1B_SPINQUANT | 2.4 |
| LLAMA3_2_1B_QLORA | 2.8 |
| LLAMA3_2_3B | 7.3 |
| LLAMA3_2_3B_SPINQUANT | 3.8 |
| LLAMA3_2_3B_QLORA | 4.0 |
OnePlus 12 (Android)
| Model | Memory Usage (GB) |
|---|---|
| LLAMA3_2_1B | 3.3 |
| LLAMA3_2_1B_SPINQUANT | 1.9 |
| LLAMA3_2_1B_QLORA | 2.7 |
| LLAMA3_2_3B | 7.1 |
| LLAMA3_2_3B_SPINQUANT | 3.7 |
| LLAMA3_2_3B_QLORA | 3.9 |
Computer Vision Models
iOS (iPhone 17 Pro)
| Model Type | Model | Memory (MB) |
|---|---|---|
| Classification | EFFICIENTNET_V2_S | 87 |
| Object Detection | SSDLITE_320_MOBILENET_V3_LARGE | 132 |
| Style Transfer | STYLE_TRANSFER_CANDY | 380 |
| OCR | CRAFT + CRNN | 1320 |
| Text-to-Image | BK_SDM_TINY_VPRED | 6050 |
Android (OnePlus 12)
| Model Type | Model | Memory (MB) |
|---|---|---|
| Classification | EFFICIENTNET_V2_S | 230 |
| Object Detection | SSDLITE_320_MOBILENET_V3_LARGE | 164 |
| Style Transfer | STYLE_TRANSFER_CANDY | 1200 |
| OCR | CRAFT + CRNN | 1400 |
| Text-to-Image | BK_SDM_TINY_VPRED | 6210 |
Speech Models
| Model | Platform | Memory (MB) |
|---|---|---|
| WHISPER_TINY | iOS | 375 |
| WHISPER_TINY | Android | 410 |
| KOKORO_SMALL | iOS | 820 |
| KOKORO_SMALL | Android | 820 |
| KOKORO_MEDIUM | iOS | 1100 |
| KOKORO_MEDIUM | Android | 1140 |
Memory Management Strategies
1. Choose Quantized Models
Quantization significantly reduces memory footprint:- SpinQuant: ~40-45% reduction
- QLoRA: ~20-25% reduction
2. Unload Models When Not Needed
Free memory by deleting models:3. Load Models on Demand
Defer loading until needed:4. Manage Context Window Size
Limit conversation history to reduce memory usage:SlidingWindowContextStrategy: Limits total token countMessageCountContextStrategy: Limits number of messagesNoopContextStrategy: No limits (use with caution)
5. Configure Generation Parameters
Reduce memory by limiting generation length:6. Clean Up Downloads
Remove cached model files when not needed:React Component Lifecycle
Proper Cleanup with Hooks
TheuseLLM hook automatically manages cleanup:
Manual Management with TypeScript API
Handling Memory Warnings
iOS Memory Warnings
Android Low Memory
Best Practices for LLMs
1. Start with Quantized Models
2. Monitor Memory Usage
3. Implement Lazy Loading
4. Use Message History Management
Device-Specific Recommendations
iOS Devices
Android Devices
Testing Memory Usage
Android Emulator Configuration
Increase emulator RAM for testing LLMs:- Open Android Studio
- Go to AVD Manager
- Edit your virtual device
- Increase RAM to 4GB or more
- Apply changes
iOS Simulator
iOS Simulator reflects host machine memory, but performance characteristics differ from real devices. Always test on physical devices.Troubleshooting Memory Issues
App Crashes During Model Load
Out of Memory During Generation
Best Practices Summary
- Use Quantized Models: SpinQuant or QLoRA for LLMs
- Manage Lifecycle: Clean up models when components unmount
- Limit Context: Use context strategies to bound memory usage
- Monitor Status: Track
isReadyanderrorstates - Test on Real Devices: Emulators don’t reflect real memory constraints
- Handle Memory Warnings: Implement platform-specific handlers
- Clean Downloads: Remove unused cached models
- Choose Appropriate Models: Match model size to target device capabilities
Next Steps
- Learn about Performance Optimization
- Explore Debugging memory-related issues
- Read the Troubleshooting Guide