Quick Start

Get up and running with Off Grid in 5 minutes. This guide walks you through downloading your first model and trying out the core features.

Before you begin, make sure you’ve installed Off Grid on your device.

Download Your First Model

Off Grid doesn’t ship with any AI models pre-installed. You’ll need to download at least one model to get started.

Open the Models screen

From the home screen, tap the Models tab at the bottom (or tap Get Models from the home screen).You’ll see a curated list of recommended models, automatically filtered based on your device’s available RAM.

Choose a model

For your first model, we recommend:

Qwen3 0.6B (Q4_K_M) — Fast, lightweight, great for testing (~500MB)
SmolLM3 1.7B (Q4_K_M) — Excellent quality/speed balance (~1GB)
Llama 3.2 3B (Q4_K_M) — Higher quality, requires 6GB+ RAM (~2GB)

What is Q4_K_M?Q4_K_M is a quantization method that compresses models to ~4 bits per weight. It provides the best balance of quality, speed, and file size for mobile devices.

Tap on a model to see details (size, parameters, description).

Download the model

Tap Download to start downloading the model.

Downloads continue in the background (you can leave the app)
Native notifications show progress
Download progress visible in the Models tab

Large models (4GB+) can take 10-30 minutes on slower connections. Make sure you have enough storage space and a stable internet connection.

Wait for download to complete

Once the download completes, the model will appear in your Downloaded Models section.The model is now ready to use — no internet required from this point forward.

Send Your First Message

Create a new chat

From the home screen, tap New Chat (or the + button in the Chats tab).Your recently downloaded model will be automatically loaded.

Type a message

Type your first message in the chat input at the bottom of the screen.Try something like:

“Explain quantum computing in simple terms”
“Write a haiku about nature”
“What are the key differences between Python and JavaScript?”

Send and watch the response stream

Tap the Send button (paper plane icon).You’ll see:

The model’s response stream in real-time (word by word)
Token generation speed (tok/s) at the bottom
A Stop button to interrupt generation if needed

Performance TipIf generation is slow (< 5 tok/s), try:

Using a smaller model
Reducing context length in Settings → Model Settings
Increasing CPU threads (Settings → Model Settings → Advanced)

Try Vision AI (Attach an Image)

Vision AI lets you attach images and ask questions about them.

Download a vision model

Go to Models → Browse Models and filter by Vision models.Recommended for first-time users:

SmolVLM 500M (Q4_K_M) — Fast, compact, ~7-10s inference (~600MB with mmproj)
Qwen3-VL 2B (Q4_K_M) — Better quality, multilingual (~2GB with mmproj)

Vision models automatically download a companion mmproj file (multimodal projector) needed for image understanding.

Start a new chat and switch to the vision model

Create a new chat and tap the model name at the top to open the model selector.Select your vision model from the list.

Attach an image

Tap the paperclip icon (📎) in the chat input.Choose:

Take Photo — Capture with your camera
Choose from Library — Select from your photo library

The image will appear as a thumbnail in the chat input.

Ask a question about the image

Type a message like:

“What’s in this image?”
“Describe the scene in detail”
“Read the text in this image”

Tap Send.

Vision inference takes longer than text (7-30s depending on model and device). You’ll see a progress indicator while processing.

Generate Your First Image

Off Grid can generate images entirely on-device using Stable Diffusion.

Download an image generation model

Go to Models → Image Models tab.Android:

CPU models (MNN): Anything V5, Absolute Reality (~1.2GB each)
NPU models (QNN): DreamShaper, Realistic Vision (~1GB each, requires Snapdragon 8 Gen 1+)

iOS:

SD 1.5 Palettized (~1GB) — Good balance
SD 1.5 Full (~4GB) — Fastest on Apple Silicon

Tap Download on your chosen model.

Start a new chat with a text model loaded

Image generation works best with a text model loaded (for optional prompt enhancement).Create a new chat with any text model active.

Type an image generation prompt

Off Grid can automatically detect image generation requests. Try:

“Draw a serene mountain landscape at sunset”
“Generate an image of a futuristic city”
“Create a portrait of a golden retriever”

Or manually toggle image mode using the image icon (🎨) next to the send button.

AI Prompt EnhancementEnable this in Settings → Model Settings → Image Settings to automatically expand simple prompts into detailed Stable Diffusion prompts using your loaded text model.

Watch the generation progress

After sending, you’ll see:

Real-time progress indicator (steps completed)
Preview images updating every few steps
Generation typically takes 5-30s depending on device and settings

The final image appears in the chat when complete.

Use Voice Input

Speak instead of typing with on-device Whisper transcription.

Download a Whisper model

Go to Settings → Voice Settings → Whisper Model.Choose:

Tiny (~40MB) — Fastest, slightly less accurate
Base (~75MB) — Good balance (recommended)
Small (~150MB) — Most accurate, slower

Tap Download to download the selected model.

Enable voice input in a chat

Open any chat and look for the microphone icon (🎤) in the chat input.

Hold to record

Press and hold the microphone button, then speak your message.

Release to transcribe
Slide to cancel if you change your mind

Your speech will be transcribed and inserted into the chat input.

Send the transcribed message

Review the transcription, edit if needed, then tap Send.

All audio processing happens on-device. No audio is ever sent to the cloud.

Advanced Features to Explore

Once you’re comfortable with the basics, try these advanced features:

Tool Calling

If your model supports function calling (e.g., Qwen3, Llama 3.2), you can enable tools:

Go to Settings → Model Settings → Text Settings
Scroll to Enabled Tools
Enable tools like Calculator, Date/Time, Device Info
In chat, the model can now call tools automatically (“What’s 15% of 240?”, “What’s the current date?”)

Document Attachments

Attach PDFs, code files, CSVs, and more:

In any chat, tap the paperclip icon (📎)
Choose Attach Document
Select a file (PDF, .txt, .py, .json, etc.)
Ask questions about the document’s content

Projects (Custom System Prompts)

Create reusable system prompts for different use cases:

Go to Settings → Projects
Tap Create Project
Name your project (e.g., “Creative Writing Assistant”)
Write a system prompt (e.g., “You are a creative writing coach…”)
In any chat, select the project from the dropdown to apply the prompt

Import Your Own Models

Bring your own .gguf files:

Go to Models → Text Models
Tap Import Model
Select a .gguf file from your device storage
The model will be copied to Off Grid’s storage and appear in your model list

Performance Tuning

If you experience slow generation or crashes, adjust these settings: Settings → Model Settings → Text Settings:

Context Length: Lower values (512-2048) use less RAM and are faster
CPU Threads: Increase to 6-8 for faster generation (if you have a flagship device)
GPU Layers: Start with 0, incrementally increase if stable (experimental on Android)
Flash Attention: Enable for faster inference (auto-disabled if GPU layers > 0)

Settings → Model Settings → Image Settings:

Steps: Lower values (10-15) generate faster but lower quality
Threads: Increase to 4-8 for faster CPU generation
Guidance Scale: Lower values (5-7) can speed up generation

Memory ManagementOff Grid monitors RAM usage and will warn you before loading models that exceed safe limits. If you see memory warnings:

Unload the current model before loading a new one
Use smaller models or lower quantizations (Q4_K_M or Q3_K_M)
Close other apps to free up RAM

Next Steps

Browse All Models

Explore the full model library with advanced filters (organization, size, quantization)

Model Settings

Fine-tune temperature, top-p, repeat penalty, and more

Storage Management

Monitor storage usage and delete unused models

Join the Community

Ask questions and share feedback on Slack

You’re all set! Off Grid is now running entirely on your device. Enable airplane mode and keep chatting — everything works offline.

Get Started

Core Features

Guides

Quick Start

Quick Start

Download Your First Model

Send Your First Message

Try Vision AI (Attach an Image)

Generate Your First Image

Use Voice Input

Advanced Features to Explore

Tool Calling

Document Attachments

Projects (Custom System Prompts)

Import Your Own Models

Performance Tuning

Next Steps

Browse All Models

Model Settings

Storage Management

Join the Community

Build docs developers (and LLMs) love

Get Started

Core Features

Guides

​Quick Start

​Download Your First Model

​Send Your First Message

​Try Vision AI (Attach an Image)

​Generate Your First Image

​Use Voice Input

​Advanced Features to Explore

​Tool Calling

​Document Attachments

​Projects (Custom System Prompts)

​Import Your Own Models

​Performance Tuning

​Next Steps

Browse All Models

Model Settings

Storage Management

Join the Community

Build docs developers (and LLMs) love

Quick Start

Download Your First Model

Send Your First Message

Try Vision AI (Attach an Image)

Generate Your First Image

Use Voice Input

Advanced Features to Explore

Tool Calling

Document Attachments

Projects (Custom System Prompts)

Import Your Own Models

Performance Tuning

Next Steps