Troubleshooting

GPU & OpenCL Crashes

Issue: App crashes during GPU layer offloading

Symptoms:

App crashes immediately after loading model with GPU layers > 0
Native crash (SIGSEGV, SIGABRT) with llama.cpp stack trace
OpenCL initialization failure on Android

Cause: OpenCL backend can crash on some Qualcomm devices during layer offload initialization. The crash typically happens before JavaScript can catch the error. Solution:

Start with 0 GPU layers

Always begin with 0 GPU layers and incrementally increase while monitoring stability.

// In model settings
gpuLayers: 0  // Start here

Then test:

Load model with 0 layers (CPU-only) — verify stability
Increase to 10 layers — test generation
Increase to 20 layers — test generation
Continue until crash or diminishing returns

The app has automatic fallback to CPU if GPU initialization fails, but this only works if the crash happens at the right stage.

Disable GPU entirely

If crashes persist, disable GPU acceleration completely:

// In ModelSettingsScreen or appStore
enableGpu: false,
gpuLayers: 0,

CPU inference uses ARM NEON, i8mm, and dotprod SIMD instructions — still performant on modern devices.

Flash attention is automatically disabled when GPU layers > 0 on Android due to llama.cpp compatibility issues.

Metal Crashes on Low-RAM Devices

Issue: App crashes on iPhone XS, iPhone 8, or other ≤4GB RAM devices

Symptoms:

App killed instantly during model load (no JavaScript error)
Metal buffer allocation crash
CLIP warmup crash during vision model initialization

Cause: On devices with ≤4GB RAM, Metal buffer allocation for LLM inference and CLIP can call abort(), killing the app before JavaScript can catch the error. This is a POSIX signal that bypasses try/catch. Solution:

Automatic safeguards (already implemented)

Off Grid automatically applies RAM-based caps before any native call:

Device RAM	GPU Layers	Context Cap	CLIP GPU
≤4GB	0 (CPU-only)	2048	Off
4-6GB	Requested	2048	On
6-8GB	Requested	4096	On
>8GB	Requested	8192	On

Helpers in src/services/llmHelpers.ts:

getMaxContextForDevice(totalMemoryBytes) — caps auto-scaled context length
getGpuLayersForDevice(totalMemoryBytes, requestedLayers) — disables Metal on ≤4GB devices

The GPU check runs before initContextWithFallback() so the dangerous Metal allocation is never attempted.

If you’re developing on a low-RAM device and still experiencing crashes, verify that getGpuLayersForDevice() is being called before model initialization.

Model Loading Failures

Issue: “Insufficient RAM” or “Memory budget exceeded”

Symptoms:

Red error message when trying to load a model
Warning banner showing memory usage
Model load blocked before attempting

Cause: Pre-load memory checks prevent loading models that would exceed the device’s safe RAM budget (60% of total RAM). Solution:

Unload current model

Unload the currently active model before loading a new one:

// In HomeScreen or ModelSelector
await activeModelService.unloadTextModel();
await activeModelService.loadTextModel(newModelId);

Or use the UI:

Go to Home screen
Tap “Unload” on active model card
Return to Models screen and load new model

Choose a smaller model

Select a model with lower quantization or smaller parameter count:

Instead of Qwen3-7B-Q5_K_M (~5.5GB), try Qwen3-7B-Q4_K_M (~4.0GB)
Instead of 7B models, try 3B models (Llama 3.2 3B, Qwen 3 3B)
For very constrained devices, use 0.6B-1.5B models (SmolLM3, Qwen3-0.6B)

See Quantization Guide for size/quality tradeoffs.

Reduce context length

Lower the context window to free up RAM:

// Settings → Model Settings → Context Length
contextLength: 2048  // Down from 4096 or 8192

Each halving of context length saves ~1-2GB RAM depending on model size.

Memory Budget Calculation

// Text models
requiredRAM = fileSize * 1.5  // KV cache, activations

// Vision models
requiredRAM = (modelFileSize + mmProjSize) * 1.5

// Image models
requiredRAM = fileSize * 1.8  // MNN/QNN runtime overhead

// Safe limit
budget = deviceTotalRAM * 0.60

Download Issues

Issue: Model download fails or stalls

Symptoms:

Download progress stuck at 0% or mid-download
“Download failed” error
Partial file in storage

Solutions:

Check network connection

Ensure stable Wi-Fi or cellular connection
Large models (>4GB) may take 10-30 minutes on slower connections
Hugging Face CDN can be slow during peak hours

Check storage space

Pre-download storage check should prevent this, but verify:

# Android
adb shell df /data

# iOS
# Settings → General → iPhone Storage

Ensure at least 2x model size free space for safe extraction.

Retry download

Android DownloadManager supports automatic retry, but manual retry:

Go to Download Manager screen
Tap failed download → Remove
Return to Models screen and re-download

Clean up orphaned files

Interrupted downloads may leave orphaned GGUF files:

Go to Settings → Storage Settings
Tap “Scan for orphaned files”
Review list and tap “Delete all orphaned files”
Retry download

Vision Model Performance

Issue: Vision inference extremely slow after prompt enhancement

Symptoms:

First vision inference fast (~7s)
Second vision inference slow (~30-60s)
Happens after using AI prompt enhancement for image generation

Cause: KV cache cleanup after prompt enhancement impacts vision inference speed. The generation service used to clear the KV cache after enhancement, which forced the next vision inference to rebuild the entire cache. Solution:

Current implementation (no action needed)

Off Grid now only calls stopGeneration() after prompt enhancement, not clearKvCache(). This preserves the KV cache and keeps vision inference fast.

// After enhancement completes
await llmService.stopGeneration();  // Clear generating flag
// Note: KV cache NOT cleared to preserve vision inference speed

If you’re modifying the codebase, ensure you never clear the KV cache unless explicitly requested by the user.

Clearing the KV cache can make vision inference 30-60s slower. Only clear when:

User explicitly taps “Clear KV Cache” in settings
Model is being unloaded
Switching to a different model

Image Generation Issues

Issue: Image generation fails or produces black images

Symptoms:

Generation completes but image is blank/black
Native crash during generation
“Model not loaded” error

Solutions:

Android: Verify backend compatibility

Ensure you’re using the correct model for your chipset:MNN (CPU) models — Work on all ARM64 devices

Anything V5, Absolute Reality, QteaMix, ChilloutMix, CuteYukiMix

QNN (NPU) models — Require Snapdragon 8 Gen 1+

All MNN models plus DreamShaper, Realistic Vision, etc.
Variants: min, 8gen1, 8gen2

The app auto-detects NPU support, but you can verify:

// In DeviceInfoScreen or debug logs
const hasQNN = await localDreamGeneratorService.supportsQNN();

iOS: Check Core ML model format

Ensure you downloaded a Core ML model (not MNN/QNN):

SD 1.5 Palettized (~1GB)
SD 2.1 Palettized (~1GB)
SDXL iOS (~2GB)
SD 1.5 Full (~4GB)
SD 2.1 Base Full (~4GB)

Models from Apple’s official HuggingFace repos only.

Reduce steps or guidance scale

Lower the generation parameters:

steps: 15         // Down from 20-25
guidanceScale: 7  // Down from 10+

Higher values = longer generation, more memory usage.

Performance Tuning

Issue: Text generation is slow

Symptoms:

Less than 5 tok/s on flagship device
Long first token time (over 3s)

Solutions:

Increase CPU threads

More threads = faster inference (to a point):

// Settings → Model Settings → CPU Threads
cpuThreads: 6  // Optimal for most devices (up to 8 on flagships)

Recommendations:

Mid-range: 4-6 threads
Flagship: 6-8 threads
Diminishing returns beyond 8

Adjust batch size

// Settings → Model Settings → Batch Size
batchSize: 256  // Balanced default

Tradeoffs:

Smaller (32-128): Faster first token
Larger (256-512): Better throughput

Reduce context length

contextLength: 2048  // Down from 4096+

Shorter context = faster inference + less memory.

Use lower quantization

Switch from Q5_K_M/Q6_K to Q4_K_M:

Q4_K_M: Faster, smaller, slightly lower quality
Q5_K_M: Balanced
Q6_K: Slower, larger, highest quality

See Quantization Guide.

Development Issues

Issue: Pre-commit hooks fail

Symptoms:

git commit fails with linting/type errors
Tests fail during commit

Solution:

Fix the errors

Pre-commit hooks run quality gates scoped to staged files:

Staged file type	Checks
`.ts`/`.tsx`/`.js`/`.jsx`	ESLint, `tsc --noEmit`, `npm test`
`.swift`	SwiftLint, `npm run test:ios`
`.kt`/`.kts`	`compileDebugKotlin`, `lintDebug`, `npm run test:android`

Fix the errors and recommit. Never skip with --no-verify.

Issue: SwiftLint not installed

Solution:

brew install swiftlint

The hook skips SwiftLint with a warning if not installed, but it’s required for iOS development.

Additional Resources

Architecture Reference

Full technical documentation

GitHub Issues

Report bugs and request features

If you’re experiencing an issue not covered here, check the GitHub Issues or ask in the Slack community.

Contributing

Reference

GPU & OpenCL Crashes

Issue: App crashes during GPU layer offloading

Metal Crashes on Low-RAM Devices

Issue: App crashes on iPhone XS, iPhone 8, or other ≤4GB RAM devices

Model Loading Failures

Issue: “Insufficient RAM” or “Memory budget exceeded”

Memory Budget Calculation

Download Issues

Issue: Model download fails or stalls

Vision Model Performance

Issue: Vision inference extremely slow after prompt enhancement

Image Generation Issues

Issue: Image generation fails or produces black images

Performance Tuning

Issue: Text generation is slow

Development Issues

Issue: Pre-commit hooks fail

Issue: SwiftLint not installed

Additional Resources

Architecture Reference

GitHub Issues

Build docs developers (and LLMs) love

Contributing

Reference

​GPU & OpenCL Crashes

​Issue: App crashes during GPU layer offloading

​Metal Crashes on Low-RAM Devices

​Issue: App crashes on iPhone XS, iPhone 8, or other ≤4GB RAM devices

​Model Loading Failures

​Issue: “Insufficient RAM” or “Memory budget exceeded”

​Memory Budget Calculation

​Download Issues

​Issue: Model download fails or stalls

​Vision Model Performance

​Issue: Vision inference extremely slow after prompt enhancement

​Image Generation Issues

​Issue: Image generation fails or produces black images

​Performance Tuning

​Issue: Text generation is slow

​Development Issues

​Issue: Pre-commit hooks fail

​Issue: SwiftLint not installed

​Additional Resources

Architecture Reference

GitHub Issues

Build docs developers (and LLMs) love

GPU & OpenCL Crashes

Issue: App crashes during GPU layer offloading

Metal Crashes on Low-RAM Devices

Issue: App crashes on iPhone XS, iPhone 8, or other ≤4GB RAM devices

Model Loading Failures

Issue: “Insufficient RAM” or “Memory budget exceeded”

Memory Budget Calculation

Download Issues

Issue: Model download fails or stalls

Vision Model Performance

Issue: Vision inference extremely slow after prompt enhancement

Image Generation Issues

Issue: Image generation fails or produces black images

Performance Tuning

Issue: Text generation is slow

Development Issues

Issue: Pre-commit hooks fail

Issue: SwiftLint not installed

Additional Resources