Overview
Off Grid’s iOS implementation uses Swift native modules bridged to React Native via Objective-C. The platform leverages Apple’s Neural Engine (ANE) for image generation, Metal GPU for LLM inference, and native iOS frameworks (PDFKit, URLSession) for document processing and downloads.Native Modules
CoreMLDiffusionModule
Manages Stable Diffusion inference on iOS using Apple’s ml-stable-diffusion pipeline with Neural Engine acceleration. File:ios/CoreMLDiffusionModule.swift
Architecture:
- Uses Apple’s
StableDiffusionPipeline(SD 1.5/2.1) orStableDiffusionXLPipeline(SDXL) - Both pipelines conform to
StableDiffusionPipelineProtocol - Automatic SDXL detection via
TextEncoder2.mlmodelcpresence - Serial dispatch queue (
ai.offgridmobile.coreml.diffusion) for thread safety - Mirrors Android
LocalDreamModuleinterface for cross-platform abstraction
TextEncoder2.mlmodelc (second text encoder), while SD 1.5/2.1 have only TextEncoder.mlmodelc.
Model Loading:
- SD 1.5/2.1
- SDXL
last_hidden_state output.false cancels generation.
Image Persistence:
Documents/generated_images/ as PNG.
Thread Safety:
PDFExtractorModule
Extracts text from PDF files using Apple’s PDFKit framework. File:ios/OffgridMobile/PDFExtractor/PDFExtractorModule.swift
Implementation:
- Native iOS framework (no third-party dependencies)
- Handles encrypted/password-protected PDFs
- Preserves text layout and reading order
- Automatic page boundary detection
maxChars (default 50,000) to prevent overwhelming the LLM context window.
Core ML Stable Diffusion Pipeline
Pipeline Architecture
Scheduler: DPM-Solver++
Apple’s pipeline uses DPM-Solver++ Multistep scheduler (default) for faster convergence:- 20 steps produces high-quality results (vs. 50+ steps for Euler)
- Better detail preservation at low step counts
- Supports guidance scale (CFG) for prompt adherence
Safety Checker: Disabled
Apple’s default pipeline includes NSFW safety checker. Off Grid disables it for:- Reduced latency — No secondary Core ML model invocation
- User control — On-device generation is private; users decide content policy
- Model size — Safety checker adds ~200MB to model bundle
reduceMemory: true (side effect: also reduces peak memory usage).
Model Variants
- Palettized (6-bit)
- Full Precision (fp16)
Size: ~1GB
Precision: 6-bit quantized weights
Performance: ~15-25s on A17 Pro/M-series (2x slower than fp16 due to dequantization)
Use case: Memory-constrained devices (iPhone 12, iPad Air 5)Models:
Precision: 6-bit quantized weights
Performance: ~15-25s on A17 Pro/M-series (2x slower than fp16 due to dequantization)
Use case: Memory-constrained devices (iPhone 12, iPad Air 5)Models:
- SD 1.5 Palettized
- SD 2.1 Palettized
- SDXL iOS (~2GB, 4-bit mixed-bit palettization)
apple/coreml-stable-diffusion-v1-5apple/coreml-stable-diffusion-2-1-baseapple/coreml-stable-diffusion-xl-base
.mlmodelc bundles (Core ML compiled model format).
Hardware Acceleration
Neural Engine (ANE)
Apple’s dedicated AI accelerator, available on A11+ and M-series chips. Capabilities:- 16 TOPS (A17 Pro), 18 TOPS (A18 Pro), 38 TOPS (M4)
- Optimized for convolutions, matrix multiplies, activations
- Low power consumption (vs. GPU)
- Automatic op dispatch via Core ML
| Compute Unit | Ops Executed | Power | Performance |
|---|---|---|---|
.all | ANE → GPU → CPU fallback | High | Fastest (if GPU-compatible) |
.cpuAndNeuralEngine | ANE → CPU fallback | Low | Recommended for mobile |
.cpuAndGPU | GPU → CPU fallback | Medium | Avoid (Metal allocations can crash on ≤4GB devices) |
Metal GPU (LLM Inference)
Llama.cpp uses Metal Performance Shaders (MPS) for GPU-accelerated LLM inference viallama.rn.
Configuration:
- User sets GPU layers (0-99) in model settings
- Metal backend offloads transformer layers to GPU
- Automatic fallback to CPU if Metal initialization fails
- A17 Pro / M-series: 25-50 tok/s with Metal
- CPU-only: 10-20 tok/s (ARM NEON)
abort() on low-RAM devices (≤4GB), killing the app before JavaScript catches the error. GPU layers forced to 0 on these devices.
CLIP GPU Acceleration:
Build Configuration
Xcode Project
Minimum Deployment Target: iOS 14.0 Required Frameworks:CoreML.framework— Core ML inference enginePDFKit.framework— PDF text extractionFoundation.framework— File I/O, networkingUIKit.framework— Image encoding
StableDiffusionPipeline and StableDiffusionXLPipeline.
Bridging Headers
Objective-C Bridge Files:ios/CoreMLDiffusionModule.mios/OffgridMobile/PDFExtractor/PDFExtractorModule.m
Code Signing
Capabilities Required:- App Sandbox (optional for Mac Catalyst)
- File Access (Documents folder)
Performance Tuning
Image Generation
Step Count:| Steps | Quality | Time (A17 Pro) | Use Case |
|---|---|---|---|
| 10 | Low | ~5s | Quick previews |
| 20 | Good | ~8-12s | Recommended default |
| 30 | Excellent | ~15-20s | High quality |
| 50 | Diminishing returns | ~30s+ | Professional |
1.0— Ignores prompt, random images7.5— Default, balanced prompt adherence15.0— High adherence, may oversaturate
Memory Management
Pipeline Unloading:- Pipelined UNet execution (processes latents in chunks)
- Safety checker disabled (saves ~200MB)
- Aggressive CoreML cache eviction
Downloads (URLSession)
iOS usesreact-native-fs (wraps URLSession) for background downloads instead of Android’s native DownloadManager.
Background Transfer:
Debugging
Core ML Logs
ANE Performance Profiling
Xcode Instruments:- Profile → Metal System Trace
- Enable “Core ML” track
- Run image generation
- Inspect ANE vs. CPU vs. GPU op distribution
- UNet: 95%+ on ANE (convolutions)
- VAE Decoder: 80%+ on ANE
- Text Encoder: 50-70% on ANE (some ops fall back to CPU)
References
- Apple ml-stable-diffusion:
https://github.com/apple/ml-stable-diffusion - Core ML Documentation:
https://developer.apple.com/documentation/coreml - PDFKit Guide:
https://developer.apple.com/documentation/pdfkit - Metal Performance Shaders:
https://developer.apple.com/documentation/metalperformanceshaders - llama.cpp Metal:
https://github.com/ggerganov/llama.cpp/blob/master/docs/backend/Metal.md