React Native Sherpa-ONNX
Offline speech-to-text, text-to-speech, and streaming speech processing for React Native applications. No internet required.What is Sherpa-ONNX?
Sherpa-ONNX is a powerful offline speech processing SDK that brings advanced speech recognition and synthesis capabilities to your React Native apps. Built on the sherpa-onnx library, it provides:- Complete offline operation - No API calls, no internet dependency, no usage limits
- Real-time streaming - Live transcription from microphone with partial results
- High accuracy - Support for state-of-the-art models like Whisper, Paraformer, and more
- Cross-platform - Full support for iOS and Android with optimized native performance
- Production ready - Battle-tested with CI/CD automation and comprehensive TypeScript types
Speech-to-Text
Transcribe audio files or live microphone input with multiple model types
Text-to-Speech
Generate natural speech from text with multi-language and multi-speaker support
Streaming Recognition
Real-time speech recognition with partial results and endpoint detection
Streaming TTS
Incremental speech generation for low latency playback
Key Features
No Internet Required
All processing happens on-device. Perfect for privacy-sensitive apps and offline scenarios.
Hardware Acceleration
Supports NNAPI, XNNPACK, Core ML, and QNN for optimal performance on mobile devices.
Multiple Model Types
Whisper, Paraformer, Zipformer, VITS, Matcha, Kokoro, and more - choose the best model for your use case.
TypeScript Native
Full type definitions and modern API design with comprehensive examples.
Flexible Model Loading
Load models from app assets, filesystem, or Play Asset Delivery (PAD) on Android.
Production Ready
Automated CI/CD, thorough testing, and used in production applications.
Platform Support
| Platform | Status | Minimum Version |
|---|---|---|
| Android | ✅ Production Ready | API 24 (Android 7.0) |
| iOS | ✅ Production Ready | iOS 13.0+ |
Supported Models
Speech-to-Text (STT)
Choose from multiple model architectures optimized for different use cases:- Whisper - OpenAI’s multilingual model (99 languages)
- Paraformer - Fast, accurate Chinese and English recognition
- Zipformer/Transducer - Streaming-capable, low latency
- NeMo CTC - NVIDIA’s high-accuracy models
- SenseVoice - Multi-language with emotion detection
- WeNet CTC - Production-grade CTC models
- FunASR Nano - Compact models for resource-constrained devices
- Tone CTC - Specialized for tonal languages
Text-to-Speech (TTS)
Generate natural, expressive speech:- VITS - Fast, high-quality synthesis (includes Piper, Coqui, MeloTTS variants)
- Matcha - High-quality acoustic modeling with vocoder
- Kokoro - Multi-speaker, multi-language TTS
- KittenTTS - Lightweight, efficient multi-speaker synthesis
- Zipvoice - Voice cloning capable
- Pocket - Flow-matching TTS for natural prosody
Use Cases
Voice Assistants
Voice Assistants
Build offline voice assistants with real-time speech recognition and natural response generation. No API costs, complete privacy.
Accessibility Apps
Accessibility Apps
Create text-to-speech readers, voice input for text fields, or audio transcription tools that work anywhere.
Medical & Healthcare
Medical & Healthcare
HIPAA-compliant medical transcription and voice note-taking with complete on-device processing.
Education & Language Learning
Education & Language Learning
Speech recognition for pronunciation practice, audio lessons, and multilingual content.
Media & Content Creation
Media & Content Creation
Automated subtitling, voiceovers, podcast transcription, and video editing tools.
IoT & Embedded
IoT & Embedded
Voice control for smart devices in environments with limited or no internet connectivity.
Requirements
- React Native >= 0.70
- Android API 24+ (Android 7.0+)
- iOS 13.0+
@dr.pogodin/react-native-fspeer dependency
Quick Links
Get Started
Install and configure the SDK
Quick Start Guide
Build your first speech app
Example App
Explore the full example application
New to speech processing? Start with the Quick Start Guide to transcribe your first audio file in under 5 minutes.
Community & Support
- GitHub Issues: Report bugs and request features
- Example Apps: Audio-to-text demo and Video-to-text comparison
- Sherpa-ONNX Docs: Upstream documentation