General
What models are supported?
What models are supported?
- Any GGUF-format model compatible with llama.cpp
- Download from HuggingFace or import
.gguffiles from device storage - Recommended: Qwen 3, Llama 3.2, Gemma 3, Phi-4, SmolLM3, Mistral, Command-R, DeepSeek
- SmolVLM (500M, 2.2B) — Fast, compact, 7-10s inference
- Qwen3-VL (2B, 8B) — Multilingual vision-language
- Gemma 3n E4B — Vision + audio
- LLaVA, MiniCPM-V
- Android (MNN): Anything V5, Absolute Reality, QteaMix, ChilloutMix, CuteYukiMix
- Android (QNN): 20 models including DreamShaper, Realistic Vision (Snapdragon 8 Gen 1+ only)
- iOS (Core ML): SD 1.5, SD 2.1, SDXL (palettized and full precision)
- Whisper Tiny, Base, Small (speed vs accuracy tradeoff)
Does it work completely offline?
Does it work completely offline?
- Download models once over Wi-Fi
- Enable airplane mode and use indefinitely
- Zero network activity during inference
- All conversations, images, and transcriptions happen entirely on-device
What data is collected?
What data is collected?
- No telemetry, analytics, or crash reporting
- No servers, no authentication, no accounts
- Conversations stored in app’s private storage (OS-level encryption)
- Models downloaded directly from HuggingFace (open-source repos)
- No third-party SDKs that phone home
- Model downloads from HuggingFace CDN (one-time)
- Model metadata from HuggingFace API (browsing only)
- Web search (if enabled in tool calling)
Which devices are supported?
Which devices are supported?
- Android 8.0 (API 26) or higher
- ARM64 processor (ARMv8-A)
- Minimum 4GB RAM (6GB+ recommended)
- 8GB+ free storage for models
- iOS 14.0 or higher
- 64-bit ARM (A12 Bionic or newer)
- Minimum 4GB RAM (6GB+ recommended)
- 8GB+ free storage for models
- Apple Silicon Macs via Mac Catalyst / iPad compatibility
- Download from iOS App Store, runs natively on M1/M2/M3/M4
How much storage do I need?
How much storage do I need?
- 0.6B Q4_K_M: ~400MB
- 3B Q4_K_M: ~2GB
- 7B Q4_K_M: ~4GB
- 7B Q5_K_M: ~5GB
- 14B Q4_K_M: ~8GB
- SmolVLM 500M: ~600MB total
- SmolVLM 2.2B: ~2.5GB total
- Qwen3-VL 2B: ~3GB total
- Android (MNN/QNN): ~1-1.2GB per model
- iOS (Core ML palettized): ~1GB per model
- iOS (Core ML full): ~4GB per model
- Tiny: ~75MB
- Base: ~140MB
- Small: ~460MB
Can I use my own models?
Can I use my own models?
- Convert your model to GGUF format using llama.cpp
- Transfer
.gguffile to your device - Go to Models screen → “Import Local Model”
- Select file from device storage
- Off Grid copies to app storage and registers it
- Android: Must be pre-converted to MNN or QNN format
- iOS: Must be compiled to Core ML format
- Currently requires manual placement in app’s models directory (advanced users)
- Import both GGUF and mmproj files
- Off Grid automatically links them if named correctly
Performance
What is GPU acceleration support?
What is GPU acceleration support?
- OpenCL — GPU offloading for text LLMs on Qualcomm Adreno GPUs (experimental)
- QNN — NPU acceleration for image generation (Snapdragon 8 Gen 1+ only)
- ARM NEON, i8mm, dotprod — SIMD CPU optimizations
- Metal — GPU acceleration for text LLMs (automatically disabled on ≤4GB RAM devices)
- ANE (Neural Engine) — Hardware acceleration for image generation via Core ML
- ARM NEON — SIMD CPU optimizations
- Settings → Model Settings → Enable GPU (on/off)
- Settings → Model Settings → GPU Layers (0-99)
How fast is text generation?
How fast is text generation?
- CPU: 15-30 tok/s (4-8 threads)
- GPU (OpenCL/Metal): 20-40 tok/s (stability varies)
- TTFT: 0.5-2s depending on context length
- CPU: 5-15 tok/s
- TTFT: 1-3s
- Model size (larger = slower)
- Quantization (lower bits = faster)
- Context length (more tokens = slower)
- Thread count (4-8 threads optimal)
- GPU layers (more = faster if stable)
How fast is image generation?
How fast is image generation?
- CPU (MNN): ~15s for 512×512 @ 20 steps (Snapdragon 8 Gen 3), ~30s (Snapdragon 7 series)
- NPU (QNN): ~5-10s for 512×512 @ 20 steps (chipset-dependent, 8 Gen 1+)
- ANE (Core ML): ~8-15s for 512×512 @ 20 steps (A17 Pro/M-series)
- Palettized models: ~2x slower due to dequantization overhead
- Steps (fewer = faster, lower quality)
- Resolution (512×512 standard)
- Guidance scale (lower = faster)
- Model precision (palettized vs full)
How fast is vision inference?
How fast is vision inference?
- Flagship: ~7s per image
- Mid-range: ~15s per image
- Flagship: ~10-15s per image
- Mid-range: ~25-35s per image
- Flagship: ~12-18s per image
- Model size (larger = slower)
- Device RAM (more RAM = can use larger models)
- KV cache state (cleared cache = 30-60s slower on next inference)
Features
What are projects?
What are projects?
- Define AI personality, expertise, or role
- Apply to all conversations in that project
- Example: “Code Review” project with system prompt about being a senior engineer
- Switch projects to change AI behavior without reloading model
- Create project with system prompt
- Select project before starting conversation
- All messages in that conversation use project’s system prompt
src/stores/projectStore.ts for implementation.What is tool calling?
What is tool calling?
- Web Search — Scrapes Brave Search for top 5 results (requires network)
- Calculator — Safe recursive descent parser (no
eval()) - Date/Time — Formatted date/time with timezone support
- Device Info — Battery, storage, memory stats
- Max 3 iterations per generation (prevents runaway loops)
- Max 5 total tool calls across all iterations
- Flow: LLM generates → tools executed → results injected → LLM continues
- Settings → Model Settings → Enabled Tools
- Tools button disabled when model doesn’t support function calling
What is AI prompt enhancement?
What is AI prompt enhancement?
- User enters simple image prompt
- System sends to LLM with enhancement-specific system prompt
- LLM expands to ~75-word detailed prompt
- Enhanced prompt sent to Stable Diffusion model
- System resets LLM state (stops generation, preserves KV cache)
- Settings → Image Settings → Enhance Prompts (on/off)
src/services/imageGenerationService.ts for implementation.What is the message queue?
What is the message queue?
- Non-blocking input — Send button active during generation
- Queue indicator — Shows count and preview in toolbar
- Clear queue — Tap “x” to discard queued messages
- Aggregated processing — Multiple queued messages combined into single prompt
- Image bypass — Image generation skips queue, processes immediately
- Queue lives in
generationService.ts(not persisted) - Multiple queued messages aggregated: texts joined with
\n\n, attachments combined
What document formats are supported?
What document formats are supported?
- Text:
.txt,.md,.log - Code:
.py,.js,.ts,.jsx,.tsx,.java,.c,.cpp,.h,.swift,.kt,.go,.rs,.rb,.php,.sql,.sh - Data:
.csv,.json,.xml,.yaml,.yml,.toml,.ini,.cfg,.conf,.html - PDF: Native text extraction via platform-specific modules
- Native file picker integration
- PDF text extraction (Android:
PdfRenderer, iOS:PDFKit) - Persistent storage (survives temp file cleanup)
- Tappable badges (opens in QuickLook/Intent viewer)
- 5MB size limit, 50K character truncation
Development
How do I build from source?
How do I build from source?
- Node.js 20+
- JDK 17 / Android SDK 36 (Android)
- Xcode 15+ (iOS)
How do I run tests?
How do I run tests?
- React Native — Jest + React Native Testing Library (stores, services, components, screens)
- Android — JUnit (LocalDream, DownloadManager, BroadcastReceiver)
- iOS — XCTest (PDFExtractor, CoreMLDiffusion, DownloadManager)
- E2E — Maestro (launch, chat, models, downloads)
What are the quality gates?
What are the quality gates?
git commit:| Staged file type | Checks |
|---|---|
.ts/.tsx/.js/.jsx | ESLint (staged only), tsc --noEmit, npm test |
.swift | SwiftLint (staged only), npm run test:ios |
.kt/.kts | compileDebugKotlin, lintDebug, npm run test:android |
- SwiftLint:
brew install swiftlint(skipped with warning if not installed) - Android checks require Gradle wrapper in
android/
--no-verify. Fix the errors and recommit.What native modules are included?
What native modules are included?
- Compiles llama.cpp for ARM64
- Android: JNI bindings, OpenCL GPU offloading
- iOS: Metal GPU acceleration
- Multimodal (vision) via mmproj
- Compiles whisper.cpp for ARM64
- Real-time audio recording and transcription
- C++ Stable Diffusion implementation
- MNN backend (CPU), QNN backend (NPU)
- Swift bridge to Apple’s
ml-stable-diffusion - ANE acceleration, DPM-Solver scheduler
- Android: Kotlin +
PdfRenderer - iOS: Swift +
PDFKit
- Native Android DownloadManager wrapper
- Background download support
How does state management work?
How does state management work?
appStore— Downloaded models, settings, hardware info, gallerychatStore— Conversations, messages, streaming stateprojectStore— Projects (custom system prompts)authStore— Passphrase lock statewhisperStore— Whisper model state
Privacy & Security
Where is data stored?
Where is data stored?
- Conversations: AsyncStorage (SQLite, encrypted at OS level)
- Models:
/data/data/ai.offgridmobile/files/models/ - Images:
/data/data/ai.offgridmobile/files/images/ - Settings: AsyncStorage
- Conversations: AsyncStorage (encrypted at OS level)
- Models:
Library/Application Support/models/ - Images:
Library/Application Support/images/ - Settings: AsyncStorage
What is passphrase lock?
What is passphrase lock?
- Protect conversations with passphrase
- Locks on app backgrounding (configurable timeout)
- Passphrase stored securely in Android Keystore
- Conversations encrypted with AES-256
- Biometric unlock (planned)
- Settings → Security → Passphrase Lock
src/services/authService.ts for implementation.Can I export my data?
Can I export my data?
- Generated images can be exported to device gallery (share button)
- Conversations stored in app’s private storage (no export yet)
- Conversation export/import with encryption
- Model export (copy to device storage for sharing)