Skip to main content

React Native Sherpa-ONNX

Offline speech-to-text, text-to-speech, and streaming speech processing for React Native applications. No internet required.

What is Sherpa-ONNX?

Sherpa-ONNX is a powerful offline speech processing SDK that brings advanced speech recognition and synthesis capabilities to your React Native apps. Built on the sherpa-onnx library, it provides:
  • Complete offline operation - No API calls, no internet dependency, no usage limits
  • Real-time streaming - Live transcription from microphone with partial results
  • High accuracy - Support for state-of-the-art models like Whisper, Paraformer, and more
  • Cross-platform - Full support for iOS and Android with optimized native performance
  • Production ready - Battle-tested with CI/CD automation and comprehensive TypeScript types

Speech-to-Text

Transcribe audio files or live microphone input with multiple model types

Text-to-Speech

Generate natural speech from text with multi-language and multi-speaker support

Streaming Recognition

Real-time speech recognition with partial results and endpoint detection

Streaming TTS

Incremental speech generation for low latency playback

Key Features

No Internet Required

All processing happens on-device. Perfect for privacy-sensitive apps and offline scenarios.

Hardware Acceleration

Supports NNAPI, XNNPACK, Core ML, and QNN for optimal performance on mobile devices.

Multiple Model Types

Whisper, Paraformer, Zipformer, VITS, Matcha, Kokoro, and more - choose the best model for your use case.

TypeScript Native

Full type definitions and modern API design with comprehensive examples.

Flexible Model Loading

Load models from app assets, filesystem, or Play Asset Delivery (PAD) on Android.

Production Ready

Automated CI/CD, thorough testing, and used in production applications.

Platform Support

PlatformStatusMinimum Version
Android✅ Production ReadyAPI 24 (Android 7.0)
iOS✅ Production ReadyiOS 13.0+

Supported Models

Speech-to-Text (STT)

Choose from multiple model architectures optimized for different use cases:
  • Whisper - OpenAI’s multilingual model (99 languages)
  • Paraformer - Fast, accurate Chinese and English recognition
  • Zipformer/Transducer - Streaming-capable, low latency
  • NeMo CTC - NVIDIA’s high-accuracy models
  • SenseVoice - Multi-language with emotion detection
  • WeNet CTC - Production-grade CTC models
  • FunASR Nano - Compact models for resource-constrained devices
  • Tone CTC - Specialized for tonal languages
Browse all STT models →

Text-to-Speech (TTS)

Generate natural, expressive speech:
  • VITS - Fast, high-quality synthesis (includes Piper, Coqui, MeloTTS variants)
  • Matcha - High-quality acoustic modeling with vocoder
  • Kokoro - Multi-speaker, multi-language TTS
  • KittenTTS - Lightweight, efficient multi-speaker synthesis
  • Zipvoice - Voice cloning capable
  • Pocket - Flow-matching TTS for natural prosody
Browse all TTS models →

Use Cases

Build offline voice assistants with real-time speech recognition and natural response generation. No API costs, complete privacy.
Create text-to-speech readers, voice input for text fields, or audio transcription tools that work anywhere.
HIPAA-compliant medical transcription and voice note-taking with complete on-device processing.
Speech recognition for pronunciation practice, audio lessons, and multilingual content.
Automated subtitling, voiceovers, podcast transcription, and video editing tools.
Voice control for smart devices in environments with limited or no internet connectivity.

Requirements

  • React Native >= 0.70
  • Android API 24+ (Android 7.0+)
  • iOS 13.0+
  • @dr.pogodin/react-native-fs peer dependency

Get Started

Install and configure the SDK

Quick Start Guide

Build your first speech app

Example App

Explore the full example application
New to speech processing? Start with the Quick Start Guide to transcribe your first audio file in under 5 minutes.

Community & Support

License

MIT License - free for commercial and personal use.

Build docs developers (and LLMs) love