React Native Sherpa-ONNX

Offline speech-to-text, text-to-speech, and streaming speech processing for React Native applications. No internet required.

What is Sherpa-ONNX?

Sherpa-ONNX is a powerful offline speech processing SDK that brings advanced speech recognition and synthesis capabilities to your React Native apps. Built on the sherpa-onnx library, it provides:

Complete offline operation - No API calls, no internet dependency, no usage limits
Real-time streaming - Live transcription from microphone with partial results
High accuracy - Support for state-of-the-art models like Whisper, Paraformer, and more
Cross-platform - Full support for iOS and Android with optimized native performance
Production ready - Battle-tested with CI/CD automation and comprehensive TypeScript types

Speech-to-Text

Transcribe audio files or live microphone input with multiple model types

Text-to-Speech

Generate natural speech from text with multi-language and multi-speaker support

Streaming Recognition

Real-time speech recognition with partial results and endpoint detection

Streaming TTS

Incremental speech generation for low latency playback

Key Features

No Internet Required

All processing happens on-device. Perfect for privacy-sensitive apps and offline scenarios.

Hardware Acceleration

Supports NNAPI, XNNPACK, Core ML, and QNN for optimal performance on mobile devices.

Multiple Model Types

Whisper, Paraformer, Zipformer, VITS, Matcha, Kokoro, and more - choose the best model for your use case.

TypeScript Native

Full type definitions and modern API design with comprehensive examples.

Flexible Model Loading

Load models from app assets, filesystem, or Play Asset Delivery (PAD) on Android.

Production Ready

Automated CI/CD, thorough testing, and used in production applications.

Platform Support

Platform	Status	Minimum Version
Android	✅ Production Ready	API 24 (Android 7.0)
iOS	✅ Production Ready	iOS 13.0+

Supported Models

Speech-to-Text (STT)

Choose from multiple model architectures optimized for different use cases:

Whisper - OpenAI’s multilingual model (99 languages)
Paraformer - Fast, accurate Chinese and English recognition
Zipformer/Transducer - Streaming-capable, low latency
NeMo CTC - NVIDIA’s high-accuracy models
SenseVoice - Multi-language with emotion detection
WeNet CTC - Production-grade CTC models
FunASR Nano - Compact models for resource-constrained devices
Tone CTC - Specialized for tonal languages

Browse all STT models →

Text-to-Speech (TTS)

Generate natural, expressive speech:

VITS - Fast, high-quality synthesis (includes Piper, Coqui, MeloTTS variants)
Matcha - High-quality acoustic modeling with vocoder
Kokoro - Multi-speaker, multi-language TTS
KittenTTS - Lightweight, efficient multi-speaker synthesis
Zipvoice - Voice cloning capable
Pocket - Flow-matching TTS for natural prosody

Browse all TTS models →

Use Cases

Voice Assistants

Build offline voice assistants with real-time speech recognition and natural response generation. No API costs, complete privacy.

Accessibility Apps

Create text-to-speech readers, voice input for text fields, or audio transcription tools that work anywhere.

Medical & Healthcare

HIPAA-compliant medical transcription and voice note-taking with complete on-device processing.

Education & Language Learning

Speech recognition for pronunciation practice, audio lessons, and multilingual content.

Media & Content Creation

Automated subtitling, voiceovers, podcast transcription, and video editing tools.

IoT & Embedded

Voice control for smart devices in environments with limited or no internet connectivity.

Requirements

React Native >= 0.70
Android API 24+ (Android 7.0+)
iOS 13.0+
@dr.pogodin/react-native-fs peer dependency

Quick Links

Get Started

Install and configure the SDK

Quick Start Guide

Build your first speech app

Example App

Explore the full example application

New to speech processing? Start with the Quick Start Guide to transcribe your first audio file in under 5 minutes.

Community & Support

GitHub Issues: Report bugs and request features
Example Apps: Audio-to-text demo and Video-to-text comparison
Sherpa-ONNX Docs: Upstream documentation

License

MIT License - free for commercial and personal use.

Get Started

Core Features

Guides

Platform Specific

Advanced

Introduction

React Native Sherpa-ONNX

What is Sherpa-ONNX?

Speech-to-Text

Text-to-Speech

Streaming Recognition

Streaming TTS

Key Features

No Internet Required

Hardware Acceleration

Multiple Model Types

TypeScript Native

Flexible Model Loading

Production Ready

Platform Support

Supported Models

Speech-to-Text (STT)

Text-to-Speech (TTS)

Use Cases

Requirements

Quick Links

Get Started

Quick Start Guide

Example App

Community & Support

License

Build docs developers (and LLMs) love

Get Started

Core Features

Guides

Platform Specific

Advanced

​React Native Sherpa-ONNX

​What is Sherpa-ONNX?

Speech-to-Text

Text-to-Speech

Streaming Recognition

Streaming TTS

​Key Features

No Internet Required

Hardware Acceleration

Multiple Model Types

TypeScript Native

Flexible Model Loading

Production Ready

​Platform Support

​Supported Models

​Speech-to-Text (STT)

​Text-to-Speech (TTS)

​Use Cases

​Requirements

​Quick Links

Get Started

Quick Start Guide

Example App

​Community & Support

​License

Build docs developers (and LLMs) love

React Native Sherpa-ONNX

What is Sherpa-ONNX?

Key Features

Platform Support

Supported Models

Speech-to-Text (STT)

Text-to-Speech (TTS)

Use Cases

Requirements

Quick Links

Community & Support

License