Voice Transcription

Overview

Off Grid includes on-device speech recognition powered by Whisper via whisper.rn native bindings. Speak your messages instead of typing them — all transcription happens locally on your device with no network required.

How It Works

Whisper is OpenAI’s speech recognition model, compiled for mobile via whisper.cpp:

Hold to record — Press and hold the microphone button in the chat input
Speak your message — Whisper transcribes in real-time (you’ll see partial results)
Release to finish — Transcription completes and inserts into the input field
Review and send — Edit if needed, then send

All audio processing happens on-device — your voice never leaves your phone.

Available Models

Off Grid supports multiple Whisper model sizes, balancing speed vs. accuracy:

Tiny (75MB)
Base (142MB)
Small (466MB)

Whisper Tiny

Size: 75MB
Speed: Fastest, real-time transcription
Accuracy: Good for clear speech
Best for: Quick messages, casual conversations

Available in:

English-only (tiny.en) — Optimized for English
Multilingual (tiny) — Supports multiple languages

Whisper Base

Size: 142MB
Speed: Fast, near real-time
Accuracy: Better than Tiny, handles accents and noise better
Best for: General use, balanced performance

Available in:

English-only (base.en)
Multilingual (base) — Recommended default

Whisper Small

Size: 466MB
Speed: Slower, slight delay
Accuracy: Highest accuracy, best noise handling
Best for: Noisy environments, technical/medical terminology, high accuracy needs

Available in:

English-only (small.en)
Multilingual (small)

Multilingual models support 99+ languages including Spanish, French, German, Chinese, Japanese, Arabic, and more. If you only speak English, use the .en variants for slightly better performance.

How to Use

1. Download a Whisper Model

Go to Settings → Voice Settings
Select a model (Base Multilingual recommended for first-time users)
Tap Download and wait for it to complete
The model is automatically set as active

Whisper models download on first use if not already installed. You’ll see a download progress indicator in the voice settings screen.

2. Grant Microphone Permission

The first time you use voice input:

Android: You’ll see a permission dialog requesting microphone access
iOS: Audio session is configured automatically and triggers the permission prompt

Grant permission to enable voice transcription.

3. Record Your Message

Open any conversation
Tap and hold the microphone button in the chat input
Speak your message clearly
Release when done
Review the transcription in the input field
Edit if needed, then send

Slide to Cancel

Changed your mind while recording?

Slide your finger left while holding the mic button
Release to cancel the recording
No transcription is performed

Language Support

Whisper multilingual models support 99 languages, including:

European: English, Spanish, French, German, Italian, Portuguese, Russian, Polish, Dutch, Swedish, Norwegian, Danish, Finnish, Greek, Turkish
Asian: Chinese (Mandarin & Cantonese), Japanese, Korean, Hindi, Thai, Vietnamese, Indonesian, Malay, Tagalog
Middle Eastern: Arabic, Hebrew, Persian, Urdu
And many more…

Setting Language

By default, Whisper auto-detects the spoken language. You can manually specify a language in Settings → Voice Settings → Language if auto-detection isn’t accurate.

Language setting uses IANA language codes (e.g., en for English, es for Spanish, zh for Chinese). Check the Whisper documentation for the full list of supported languages.

Performance

Whisper transcription is real-time on most devices:

Model	Device Class	Speed	Use Case
Tiny	All devices	Real-time	Quick messages
Base	Mid-range+	Near real-time	General use
Small	Flagship	Slight delay	High accuracy

Real-time means transcription keeps up with your speech — you see results as you talk.

Factors Affecting Speed

Model size — Larger models are slower but more accurate
Device CPU — Faster processors = faster transcription
Audio length — Longer recordings take more time to process
Background noise — More noise = more processing time

Technical Details

How Whisper Works

Audio capture — whisper.rn records audio via native audio APIs
Preprocessing — Audio is converted to the format Whisper expects (16kHz mono)
Inference — whisper.cpp processes the audio and generates transcription
Streaming results — Partial transcriptions are sent to React Native via callbacks
Final output — Complete transcription is inserted into the chat input

Real-time Transcription API

Off Grid uses whisper.rn’s transcribeRealtime API:

const { stop, subscribe } = await context.transcribeRealtime({
  language: 'en',
  maxLen: 0, // no limit
  realtimeAudioSec: 30, // process in 30-second chunks
  realtimeAudioSliceSec: 3, // slice every 3 seconds for faster intermediate results
});

30-second chunks — Audio is processed in 30-second segments
3-second slices — Intermediate results every 3 seconds for responsive UI
Streaming events — subscribe() receives events with isCapturing, text, processTime, etc.

Storage Location

Whisper models are stored in:

{DocumentDirectory}/whisper-models/
  ├── ggml-tiny.en.bin
  ├── ggml-base.en.bin
  ├── ggml-small.en.bin
  └── ...

Models are downloaded from Hugging Face and persist across app updates.

Audio Session (iOS)

On iOS, whisper.rn configures the audio session:

Category: PlayAndRecord (allows recording + playback)
Options: AllowBluetooth, MixWithOthers (Bluetooth headset support, mix with other audio)
Mode: Default
Restore on stop — Audio session is restored to previous state after recording

Permissions

Android:

Requires RECORD_AUDIO permission
Requested on first use via PermissionsAndroid.request()

iOS:

Microphone permission triggered when audio session is activated
Configured automatically by whisper.rn

Tips

Getting the Best Transcription Quality

Speak clearly — Enunciate words, avoid mumbling
Minimize background noise — Find a quiet environment
Use a good microphone — Built-in mic works, but Bluetooth headsets are better
Short sentences — Pause between thoughts for better accuracy
Use the right model — Base for general use, Small for noisy environments

Choosing the Right Model

Speed priority: Tiny (English or Multilingual)
Balanced: Base Multilingual (recommended)
Accuracy priority: Small (English or Multilingual)
English-only users: Use .en variants for slightly better performance
Multilingual users: Use multilingual variants and let Whisper auto-detect language

Troubleshooting

Transcription is slow:

Try a smaller model (Base instead of Small)
Ensure no other apps are using the microphone
Check CPU usage in Settings → Device Info

Microphone permission denied:

Go to device Settings → Apps → Off Grid → Permissions → Microphone → Allow
Restart the app after granting permission

Transcription is inaccurate:

Try a larger model (Small instead of Tiny)
Speak more clearly and reduce background noise
Manually set the language in Voice Settings if auto-detection is wrong

Recording stuck / won’t stop:

Release the mic button fully
If stuck, force-stop the app and restart
Check logs for errors

No transcription appears:

Check if a Whisper model is downloaded (Settings → Voice Settings)
Ensure microphone permission is granted
Try recording again (sometimes first attempt fails)

Privacy

All voice transcription happens 100% on-device:

Your voice never leaves your device
No cloud API calls
No audio uploaded to servers
Works completely offline (after model download)

You can enable airplane mode and use voice transcription indefinitely.

File Transcription

Whisper can also transcribe pre-recorded audio files:

const transcription = await whisperService.transcribeFile(
  '/path/to/audio.wav',
  { language: 'en' }
);

This feature is available via the API but not yet exposed in the UI. Useful for developers building custom workflows.

Get Started

Core Features

Guides

Voice Transcription

Overview

How It Works

Available Models

Whisper Tiny

Whisper Base

Whisper Small

How to Use

1. Download a Whisper Model

2. Grant Microphone Permission

3. Record Your Message

Slide to Cancel

Language Support

Setting Language

Performance

Factors Affecting Speed

Technical Details

How Whisper Works

Real-time Transcription API

Storage Location

Audio Session (iOS)

Permissions

Tips

Getting the Best Transcription Quality

Choosing the Right Model

Troubleshooting

Privacy

File Transcription

Build docs developers (and LLMs) love

Get Started

Core Features

Guides

​Overview

​How It Works

​Available Models

​Whisper Tiny

​Whisper Base

​Whisper Small

​How to Use

​1. Download a Whisper Model

​2. Grant Microphone Permission

​3. Record Your Message

​Slide to Cancel

​Language Support

​Setting Language

​Performance

​Factors Affecting Speed

​Technical Details

​How Whisper Works

​Real-time Transcription API

​Storage Location

​Audio Session (iOS)

​Permissions

​Tips

​Getting the Best Transcription Quality

​Choosing the Right Model

​Troubleshooting

​Privacy

​File Transcription

Build docs developers (and LLMs) love

Overview

How It Works

Available Models

Whisper Tiny

Whisper Base

Whisper Small

How to Use

1. Download a Whisper Model

2. Grant Microphone Permission

3. Record Your Message

Slide to Cancel

Language Support

Setting Language

Performance

Factors Affecting Speed

Technical Details

How Whisper Works

Real-time Transcription API

Storage Location

Audio Session (iOS)

Permissions

Tips

Getting the Best Transcription Quality

Choosing the Right Model

Troubleshooting

Privacy

File Transcription