GPU & OpenCL Crashes
Issue: App crashes during GPU layer offloading
Symptoms:- App crashes immediately after loading model with GPU layers > 0
- Native crash (SIGSEGV, SIGABRT) with llama.cpp stack trace
- OpenCL initialization failure on Android
Start with 0 GPU layers
Start with 0 GPU layers
Always begin with 0 GPU layers and incrementally increase while monitoring stability.Then test:
- Load model with 0 layers (CPU-only) — verify stability
- Increase to 10 layers — test generation
- Increase to 20 layers — test generation
- Continue until crash or diminishing returns
Disable GPU entirely
Disable GPU entirely
If crashes persist, disable GPU acceleration completely:CPU inference uses ARM NEON, i8mm, and dotprod SIMD instructions — still performant on modern devices.
Metal Crashes on Low-RAM Devices
Issue: App crashes on iPhone XS, iPhone 8, or other ≤4GB RAM devices
Symptoms:- App killed instantly during model load (no JavaScript error)
- Metal buffer allocation crash
- CLIP warmup crash during vision model initialization
abort(), killing the app before JavaScript can catch the error. This is a POSIX signal that bypasses try/catch.
Solution:
Automatic safeguards (already implemented)
Automatic safeguards (already implemented)
Off Grid automatically applies RAM-based caps before any native call:
Helpers in
| Device RAM | GPU Layers | Context Cap | CLIP GPU |
|---|---|---|---|
| ≤4GB | 0 (CPU-only) | 2048 | Off |
| 4-6GB | Requested | 2048 | On |
| 6-8GB | Requested | 4096 | On |
| >8GB | Requested | 8192 | On |
src/services/llmHelpers.ts:getMaxContextForDevice(totalMemoryBytes)— caps auto-scaled context lengthgetGpuLayersForDevice(totalMemoryBytes, requestedLayers)— disables Metal on ≤4GB devices
initContextWithFallback() so the dangerous Metal allocation is never attempted.If you’re developing on a low-RAM device and still experiencing crashes, verify that
getGpuLayersForDevice() is being called before model initialization.Model Loading Failures
Issue: “Insufficient RAM” or “Memory budget exceeded”
Symptoms:- Red error message when trying to load a model
- Warning banner showing memory usage
- Model load blocked before attempting
Unload current model
Unload current model
Unload the currently active model before loading a new one:Or use the UI:
- Go to Home screen
- Tap “Unload” on active model card
- Return to Models screen and load new model
Choose a smaller model
Choose a smaller model
Select a model with lower quantization or smaller parameter count:
- Instead of Qwen3-7B-Q5_K_M (~5.5GB), try Qwen3-7B-Q4_K_M (~4.0GB)
- Instead of 7B models, try 3B models (Llama 3.2 3B, Qwen 3 3B)
- For very constrained devices, use 0.6B-1.5B models (SmolLM3, Qwen3-0.6B)
Reduce context length
Reduce context length
Lower the context window to free up RAM:Each halving of context length saves ~1-2GB RAM depending on model size.
Memory Budget Calculation
Download Issues
Issue: Model download fails or stalls
Symptoms:- Download progress stuck at 0% or mid-download
- “Download failed” error
- Partial file in storage
Check network connection
Check network connection
- Ensure stable Wi-Fi or cellular connection
- Large models (>4GB) may take 10-30 minutes on slower connections
- Hugging Face CDN can be slow during peak hours
Check storage space
Check storage space
Pre-download storage check should prevent this, but verify:Ensure at least 2x model size free space for safe extraction.
Retry download
Retry download
Android DownloadManager supports automatic retry, but manual retry:
- Go to Download Manager screen
- Tap failed download → Remove
- Return to Models screen and re-download
Clean up orphaned files
Clean up orphaned files
Interrupted downloads may leave orphaned GGUF files:
- Go to Settings → Storage Settings
- Tap “Scan for orphaned files”
- Review list and tap “Delete all orphaned files”
- Retry download
Vision Model Performance
Issue: Vision inference extremely slow after prompt enhancement
Symptoms:- First vision inference fast (~7s)
- Second vision inference slow (~30-60s)
- Happens after using AI prompt enhancement for image generation
Current implementation (no action needed)
Current implementation (no action needed)
Off Grid now only calls If you’re modifying the codebase, ensure you never clear the KV cache unless explicitly requested by the user.
stopGeneration() after prompt enhancement, not clearKvCache(). This preserves the KV cache and keeps vision inference fast.Image Generation Issues
Issue: Image generation fails or produces black images
Symptoms:- Generation completes but image is blank/black
- Native crash during generation
- “Model not loaded” error
Android: Verify backend compatibility
Android: Verify backend compatibility
Ensure you’re using the correct model for your chipset:MNN (CPU) models — Work on all ARM64 devices
- Anything V5, Absolute Reality, QteaMix, ChilloutMix, CuteYukiMix
- All MNN models plus DreamShaper, Realistic Vision, etc.
- Variants:
min,8gen1,8gen2
iOS: Check Core ML model format
iOS: Check Core ML model format
Ensure you downloaded a Core ML model (not MNN/QNN):
- SD 1.5 Palettized (~1GB)
- SD 2.1 Palettized (~1GB)
- SDXL iOS (~2GB)
- SD 1.5 Full (~4GB)
- SD 2.1 Base Full (~4GB)
Reduce steps or guidance scale
Reduce steps or guidance scale
Lower the generation parameters:Higher values = longer generation, more memory usage.
Performance Tuning
Issue: Text generation is slow
Symptoms:- Less than 5 tok/s on flagship device
- Long first token time (over 3s)
Increase CPU threads
Increase CPU threads
More threads = faster inference (to a point):Recommendations:
- Mid-range: 4-6 threads
- Flagship: 6-8 threads
- Diminishing returns beyond 8
Adjust batch size
Adjust batch size
- Smaller (32-128): Faster first token
- Larger (256-512): Better throughput
Reduce context length
Reduce context length
Use lower quantization
Use lower quantization
Switch from Q5_K_M/Q6_K to Q4_K_M:
- Q4_K_M: Faster, smaller, slightly lower quality
- Q5_K_M: Balanced
- Q6_K: Slower, larger, highest quality
Development Issues
Issue: Pre-commit hooks fail
Symptoms:git commitfails with linting/type errors- Tests fail during commit
Fix the errors
Fix the errors
Pre-commit hooks run quality gates scoped to staged files:
Fix the errors and recommit. Never skip with
| Staged file type | Checks |
|---|---|
.ts/.tsx/.js/.jsx | ESLint, tsc --noEmit, npm test |
.swift | SwiftLint, npm run test:ios |
.kt/.kts | compileDebugKotlin, lintDebug, npm run test:android |
--no-verify.Issue: SwiftLint not installed
Solution:Additional Resources
Architecture Reference
Full technical documentation
GitHub Issues
Report bugs and request features
If you’re experiencing an issue not covered here, check the GitHub Issues or ask in the Slack community.