Quick Start Guide
Get started with offline speech-to-text and text-to-speech in under 5 minutes.Prerequisites
Before you begin, make sure you have:- Completed the Installation steps
- A model downloaded (see Model Setup or use the quick download below)
- An audio file to test (or use the examples below)
Download a Model
For this guide, we’ll use a small Whisper model for English transcription:Choose a model
Download the Whisper Tiny English model (~40MB, fast, good accuracy):Or use the Model Download Manager in your app:
See Model Setup for detailed instructions on bundling models, using Play Asset Delivery, or loading from the filesystem.
Speech-to-Text (STT)
Transcribe audio files with offline speech recognition.Initialize the STT engine
Create an STT instance with your model:
Model path options
Model path options
You can load models from different locations:
Transcribe Audio Samples
You can also transcribe raw PCM audio samples:Complete STT Example
STTExample.tsx
Text-to-Speech (TTS)
Generate natural speech from text offline.Download a TTS model
Download a VITS Piper model (~10-50MB depending on voice):Place in
android/app/src/main/assets/models/ or add to Xcode resources.TTS with Options
Customize speech generation with options:Complete TTS Example
TTSExample.tsx
Real-Time Streaming Recognition
Transcribe live microphone input with partial results.Initialize streaming engine
Only certain model types support streaming:
transducer, paraformer, zipformer2_ctc, nemo_ctc, tone_ctc.Real-Time Microphone Transcription
MicrophoneSTT.tsx
Next Steps
Now that you’ve built your first speech app, explore more features:Model Setup
Learn about model types, quantization, and Play Asset Delivery
STT API Reference
Complete STT API documentation
TTS API Reference
Complete TTS API documentation
Streaming TTS
Low-latency incremental speech generation
Execution Providers
Hardware acceleration with NNAPI, Core ML, QNN
Example App
Browse the full-featured example application
Need help? Check out the example app source code for complete working examples of STT, TTS, and streaming.