System Architecture
Off Grid is a React Native mobile app with native modules for on-device AI inference. The architecture follows a layered design separating UI, services, native bridges, and hardware acceleration.Layer Breakdown
UI Layer (React Native)
Framework: React Native 0.83 with TypeScript 5.x Design System: Brutalist terminal-inspired interface with monochromatic palette and emerald accents. Full light/dark theme support viauseTheme() hook.
Navigation: React Navigation 7.x with bottom tabs and modal stacks.
Animations: react-native-reanimated for spring-based physics and staggered entrance effects.
Key Components:
- ChatInput — Message composition with attachment badges (src/components/ChatInput.tsx)
- ModelCard — Model display with actions (src/components/ModelCard.tsx)
- AnimatedPressable — Spring scale feedback + haptics (src/components/AnimatedPressable.tsx)
- AppSheet — Custom swipe-dismissible bottom sheets (src/components/AppSheet.tsx)
Services Layer (TypeScript)
All core services are singleton instances to prevent duplicate model loading, concurrent inference conflicts, and memory leaks.Core Services
llmService (src/services/llm.ts:26)Wraps llama.rn for GGUF model lifecycle and text/vision inference:
- Model loading with automatic context scaling based on device RAM
- Streaming token generation with 50ms batched UI updates
- Vision model support via mmproj (multimodal projector) files
- Tool calling detection from jinja chat templates
- KV cache management with quantization (f16/q8_0/q4_0)
Wraps whisper.rn for speech-to-text transcription with multiple model sizes (Tiny, Base, Small). hardwareService (src/services/hardware.ts)
Device info retrieval: RAM, CPU cores, SoC model, storage.
Orchestration Services
generationService (src/services/generationService.ts:29)Background-safe text generation orchestration:
- Maintains generation state independently of React component lifecycle
- Token batching: collects tokens and flushes to UI every 50ms
- Message queue: non-blocking input during active generation
- Tool loop integration (max 3 iterations, 5 tool calls)
Background-safe image generation with progressive preview:
- Continues generation when screens unmount
- Real-time step progress (1-50 steps)
- Preview images every N steps
- Optional LLM-based prompt enhancement
Management Services
activeModelService (src/services/activeModelService/index.ts:29)Singleton for safe model loading/unloading:
- Guards against concurrent loads with promise deduplication
- Pre-load memory checks (60% RAM budget enforcement)
- Automatic model unload when switching models
- Synchronization with native state on app resume
Native Bridge Layer
JNI (Android) and Objective-C (iOS) bindings connect TypeScript to native C++ and Swift modules.Native Module Layer
Cross-Platform
llama.rn — llama.cpp compiled for ARM64 with GPU acceleration:- Android: OpenCL (Adreno GPUs), NEON/i8mm SIMD
- iOS: Metal GPU, Neural Engine for vision models
- Multimodal support via mmproj (CLIP vision encoder)
Android-Only
LocalDreamModule (android/app/src/main/java/ai/offgridmobile/localdream/LocalDreamModule.kt:32)Stable Diffusion via local-dream C++ library:
- MNN backend: CPU inference (all ARM64 devices)
- QNN backend: Qualcomm NPU (Snapdragon 8 Gen 1+)
- Subprocess architecture: spawns HTTP server on localhost:18081
- Automatic backend selection with CPU fallback
Native Android DownloadManager wrapper:
- Background downloads with system notifications
- Progress polling (500ms intervals)
- Persistent download tracking via SharedPreferences
- Cleanup of completed/stale downloads
Text extraction using
PdfiumAndroid library with page-by-page processing.
iOS-Only
CoreMLDiffusionModule (ios/CoreMLDiffusionModule.swift:11)Stable Diffusion via Apple’s
ml-stable-diffusion pipeline:
- Neural Engine (ANE) + CPU compute units
- DPM-Solver scheduler for faster convergence
- Supports both SD 1.5/2.1 and SDXL models
- Palettized (6-bit) and full-precision (fp16) models
Text extraction using Apple’s
PDFKit framework.
Technology Stack
| Layer | Technologies |
|---|---|
| UI Framework | React Native 0.83, TypeScript 5.x |
| State Management | Zustand 5.x with AsyncStorage persistence |
| Navigation | React Navigation 7.x |
| Animations | React Native Reanimated 4.x, haptic feedback |
| Text Inference | llama.cpp via llama.rn (GGUF format) |
| Vision Inference | llama.cpp multimodal (mmproj) |
| Voice Transcription | whisper.cpp via whisper.rn |
| Image Generation (Android) | local-dream (MNN/QNN backends) |
| Image Generation (iOS) | ml-stable-diffusion (Core ML) |
| PDF Extraction (Android) | PdfiumAndroid |
| PDF Extraction (iOS) | PDFKit |
| File Picker | @react-native-documents/picker |
| Document Viewer | @react-native-documents/viewer |
Data Flow
Text Generation Flow
- User Input → ChatScreen collects message + attachments
- Service Call →
generationService.generateResponse()or.generateWithTools() - Context Management → llmService passes all messages to llama.rn (no JS truncation)
- Native Inference → llama.cpp streams tokens via callback
- Token Batching → generationService buffers tokens, flushes every 50ms
- UI Update → chatStore updates streaming message
- Completion → Finalize message with metadata (tok/s, TTFT, generation time)
Vision Inference Flow
- Image Attachment → User attaches photo from camera/library
- mmproj Check → llmService verifies multimodal initialized
- OAI Message Format → Convert to OpenAI-compatible message with image URIs
- CLIP Encoding → Native CLIP processes image to embeddings
- LLM Processing → llama.cpp merges text + vision embeddings
- Response → Stream tokens as normal text generation
Image Generation Flow (Android)
- Prompt Input → User sends text or enables image mode toggle
- Intent Detection → Pattern matching or LLM-based classification
- Prompt Enhancement → Optional: LLM expands prompt (“a dog” → detailed 75-word description)
- Model Load → LocalDreamModule starts subprocess server (MNN or QNN)
- HTTP Request → TypeScript sends POST to localhost:18081/generate
- SSE Stream → Server sends progress events (step/totalSteps) + preview images
- RGB → PNG → Native code decodes base64 RGB, converts to PNG
- Gallery Save → Image stored in app files, added to gallery
Image Generation Flow (iOS)
- Prompt Input → Same as Android
- Pipeline Load → CoreMLDiffusionModule loads StableDiffusionPipeline
- Generation → Native Swift calls
pipeline.generateImages()with progress callback - ANE Acceleration → Neural Engine processes UNet denoising steps
- PNG Save → CGImage converted to PNG, stored in documents
- Gallery Save → Same as Android
Design Patterns
See System Design for detailed patterns:- Singleton services
- Background-safe orchestration
- Memory-first loading strategy
- Combined asset tracking (vision models + mmproj)
- State cleanup patterns