Quick Start
Get up and running with Off Grid in 5 minutes. This guide walks you through downloading your first model and trying out the core features.Before you begin, make sure you’ve installed Off Grid on your device.
Download Your First Model
Off Grid doesn’t ship with any AI models pre-installed. You’ll need to download at least one model to get started.Open the Models screen
From the home screen, tap the Models tab at the bottom (or tap Get Models from the home screen).You’ll see a curated list of recommended models, automatically filtered based on your device’s available RAM.
Choose a model
For your first model, we recommend:Tap on a model to see details (size, parameters, description).
- Qwen3 0.6B (Q4_K_M) — Fast, lightweight, great for testing (~500MB)
- SmolLM3 1.7B (Q4_K_M) — Excellent quality/speed balance (~1GB)
- Llama 3.2 3B (Q4_K_M) — Higher quality, requires 6GB+ RAM (~2GB)
What is Q4_K_M?Q4_K_M is a quantization method that compresses models to ~4 bits per weight. It provides the best balance of quality, speed, and file size for mobile devices.
Download the model
Tap Download to start downloading the model.
- Downloads continue in the background (you can leave the app)
- Native notifications show progress
- Download progress visible in the Models tab
Send Your First Message
Create a new chat
From the home screen, tap New Chat (or the + button in the Chats tab).Your recently downloaded model will be automatically loaded.
Type a message
Type your first message in the chat input at the bottom of the screen.Try something like:
- “Explain quantum computing in simple terms”
- “Write a haiku about nature”
- “What are the key differences between Python and JavaScript?”
Send and watch the response stream
Tap the Send button (paper plane icon).You’ll see:
- The model’s response stream in real-time (word by word)
- Token generation speed (tok/s) at the bottom
- A Stop button to interrupt generation if needed
Performance TipIf generation is slow (< 5 tok/s), try:
- Using a smaller model
- Reducing context length in Settings → Model Settings
- Increasing CPU threads (Settings → Model Settings → Advanced)
Try Vision AI (Attach an Image)
Vision AI lets you attach images and ask questions about them.Download a vision model
Go to Models → Browse Models and filter by Vision models.Recommended for first-time users:
- SmolVLM 500M (Q4_K_M) — Fast, compact, ~7-10s inference (~600MB with mmproj)
- Qwen3-VL 2B (Q4_K_M) — Better quality, multilingual (~2GB with mmproj)
Vision models automatically download a companion mmproj file (multimodal projector) needed for image understanding.
Start a new chat and switch to the vision model
Create a new chat and tap the model name at the top to open the model selector.Select your vision model from the list.
Attach an image
Tap the paperclip icon (📎) in the chat input.Choose:
- Take Photo — Capture with your camera
- Choose from Library — Select from your photo library
Generate Your First Image
Off Grid can generate images entirely on-device using Stable Diffusion.Download an image generation model
Go to Models → Image Models tab.Android:
- CPU models (MNN): Anything V5, Absolute Reality (~1.2GB each)
- NPU models (QNN): DreamShaper, Realistic Vision (~1GB each, requires Snapdragon 8 Gen 1+)
- SD 1.5 Palettized (~1GB) — Good balance
- SD 1.5 Full (~4GB) — Fastest on Apple Silicon
Start a new chat with a text model loaded
Image generation works best with a text model loaded (for optional prompt enhancement).Create a new chat with any text model active.
Type an image generation prompt
Off Grid can automatically detect image generation requests. Try:
- “Draw a serene mountain landscape at sunset”
- “Generate an image of a futuristic city”
- “Create a portrait of a golden retriever”
AI Prompt EnhancementEnable this in Settings → Model Settings → Image Settings to automatically expand simple prompts into detailed Stable Diffusion prompts using your loaded text model.
Use Voice Input
Speak instead of typing with on-device Whisper transcription.Download a Whisper model
Go to Settings → Voice Settings → Whisper Model.Choose:
- Tiny (~40MB) — Fastest, slightly less accurate
- Base (~75MB) — Good balance (recommended)
- Small (~150MB) — Most accurate, slower
Hold to record
Press and hold the microphone button, then speak your message.
- Release to transcribe
- Slide to cancel if you change your mind
Advanced Features to Explore
Once you’re comfortable with the basics, try these advanced features:Tool Calling
If your model supports function calling (e.g., Qwen3, Llama 3.2), you can enable tools:- Go to Settings → Model Settings → Text Settings
- Scroll to Enabled Tools
- Enable tools like Calculator, Date/Time, Device Info
- In chat, the model can now call tools automatically (“What’s 15% of 240?”, “What’s the current date?”)
Document Attachments
Attach PDFs, code files, CSVs, and more:- In any chat, tap the paperclip icon (📎)
- Choose Attach Document
- Select a file (PDF, .txt, .py, .json, etc.)
- Ask questions about the document’s content
Projects (Custom System Prompts)
Create reusable system prompts for different use cases:- Go to Settings → Projects
- Tap Create Project
- Name your project (e.g., “Creative Writing Assistant”)
- Write a system prompt (e.g., “You are a creative writing coach…”)
- In any chat, select the project from the dropdown to apply the prompt
Import Your Own Models
Bring your own.gguf files:
- Go to Models → Text Models
- Tap Import Model
- Select a
.gguffile from your device storage - The model will be copied to Off Grid’s storage and appear in your model list
Performance Tuning
If you experience slow generation or crashes, adjust these settings: Settings → Model Settings → Text Settings:- Context Length: Lower values (512-2048) use less RAM and are faster
- CPU Threads: Increase to 6-8 for faster generation (if you have a flagship device)
- GPU Layers: Start with 0, incrementally increase if stable (experimental on Android)
- Flash Attention: Enable for faster inference (auto-disabled if GPU layers > 0)
- Steps: Lower values (10-15) generate faster but lower quality
- Threads: Increase to 4-8 for faster CPU generation
- Guidance Scale: Lower values (5-7) can speed up generation
Next Steps
Browse All Models
Explore the full model library with advanced filters (organization, size, quantization)
Model Settings
Fine-tune temperature, top-p, repeat penalty, and more
Storage Management
Monitor storage usage and delete unused models
Join the Community
Ask questions and share feedback on Slack
You’re all set! Off Grid is now running entirely on your device. Enable airplane mode and keep chatting — everything works offline.