Speech-to-Text Models Overview

react-native-sherpa-onnx supports a wide range of speech-to-text (STT) model architectures, from fast streaming transducers to multilingual models like Whisper. This guide helps you choose the right model for your use case.

Model Comparison

Zipformer/Transducer

Fast streaming recognition with excellent accuracy. Best for real-time use cases.

Paraformer

Non-autoregressive ASR. Fast batch processing with high accuracy.

Whisper

Multilingual, robust zero-shot recognition. Strong for diverse audio conditions.

NeMo CTC

Excellent for English and streaming. Good balance of speed and accuracy.

Other Models

WeNet, SenseVoice, FunASR, Moonshine, and specialized models.

Quick Comparison Table

Model Type	Streaming	Multilingual	Speed	Use Case
Zipformer/Transducer	✅ Yes	Depends	Fast	Real-time recognition, voice assistants
LSTM Transducer	✅ Yes	Depends	Fast	Streaming ASR, mobile apps
Paraformer	✅ Yes	Limited	Very Fast	Fast batch transcription
Whisper	❌ No	✅ Yes (90+ langs)	Medium	Multilingual transcription, diverse audio
NeMo CTC	✅ Yes	Limited	Fast	English streaming, live captions
WeNet CTC	❌ No	Limited	Fast	Compact deployment
SenseVoice	❌ No	✅ Yes	Medium	Emotion detection, punctuation
FunASR Nano	❌ No	Limited	Medium	LLM-based ASR with prompts
Moonshine	✅ Yes (v1 & v2)	Limited	Fast	Streaming-capable lightweight ASR
Fire Red ASR	❌ No	Limited	Medium	Encoder-decoder ASR
Dolphin	❌ No	Limited	Fast	Single-model CTC
Canary	❌ No	✅ Yes	Medium	Multilingual NeMo model
Omnilingual	❌ No	✅ Yes	Medium	Wide language coverage
Tone CTC	✅ Yes	Limited	Very Fast	Lightweight streaming CTC

Choosing a Model

For Real-Time Recognition (Streaming)

If you need live recognition from a microphone, choose one of these streaming-capable models:

Zipformer/Transducer – Best overall for streaming, excellent accuracy
NeMo CTC – Great for English streaming applications
Tone CTC – Lightweight option for resource-constrained devices
LSTM Transducer – LSTM-based streaming alternative
Paraformer – Fast streaming with non-autoregressive approach
Moonshine – Modern streaming-capable architecture

For Batch/Offline Transcription

If you’re transcribing pre-recorded audio files:

Whisper – Best for multilingual content, robust to noise
Paraformer – Fastest for single-language batch processing
SenseVoice – When you need emotion labels and punctuation
Canary – Multilingual with good accuracy

By Language Support

English Only:

NeMo CTC (streaming)
Tone CTC (streaming)
Many Zipformer variants

Multilingual (90+ languages):

Whisper (offline)
Canary (offline)
Omnilingual (offline)
SenseVoice (5 languages + emotion)

Chinese:

Paraformer (excellent for Mandarin)
FunASR Nano (LLM-based with prompts)
SenseVoice (Chinese + emotion)

By Device Constraints

Low-end devices / limited RAM:

Tone CTC (lightweight streaming)
Dolphin (compact single-model)
WeNet CTC (compact deployment)
Use int8 quantized variants when available

High-end devices:

Whisper (large models)
Canary (multilingual)
Full Zipformer models

Model Detection

The SDK automatically detects model types based on folder name patterns and file layouts. You can also force a specific type:

import { createSTT, detectSttModel } from 'react-native-sherpa-onnx/stt';

// Auto-detect model type
const detectedInfo = await detectSttModel({
  type: 'asset',
  path: 'models/sherpa-onnx-whisper-tiny-en'
});
console.log(detectedInfo.modelType); // 'whisper'

// Create STT with auto-detection
const stt = await createSTT({
  modelPath: { type: 'asset', path: 'models/sherpa-onnx-whisper-tiny-en' },
  modelType: 'auto', // Auto-detect
  preferInt8: true,
});

Performance Tips

Use Quantized Models

Set preferInt8: true to automatically use int8 quantized models when available:

const stt = await createSTT({
  modelPath: { type: 'asset', path: 'models/whisper-tiny' },
  preferInt8: true, // Faster inference, smaller memory footprint
});

Adjust Thread Count

Increase threads on multi-core devices:

const stt = await createSTT({
  modelPath: { type: 'asset', path: 'models/zipformer' },
  numThreads: 4, // Use multiple cores
});

Use Execution Providers

Leverage hardware acceleration:

const stt = await createSTT({
  modelPath: { type: 'asset', path: 'models/paraformer' },
  provider: 'nnapi', // Android NNAPI
  // provider: 'xnnpack', // For XNNPACK
  // provider: 'qnn',     // Qualcomm QNN
});

See the Execution Providers guide for more details.

Download Links

All model downloads are available from the sherpa-onnx pretrained models repository:

Next Steps

Model Setup Guide

Learn how to download and bundle models with your app

STT API Reference

Detailed API documentation for speech recognition

Streaming STT

Real-time recognition from microphone

Hotwords

Contextual biasing for improved accuracy

Speech-to-Text Models

Text-to-Speech Models

STT Models Overview

Speech-to-Text Models Overview

Model Comparison

Zipformer/Transducer

Paraformer

Whisper

NeMo CTC

Other Models

Quick Comparison Table

Choosing a Model

For Real-Time Recognition (Streaming)

For Batch/Offline Transcription

By Language Support

By Device Constraints

Model Detection

Performance Tips

Use Quantized Models

Adjust Thread Count

Use Execution Providers

Download Links

Next Steps

Model Setup Guide

STT API Reference

Streaming STT

Hotwords

Build docs developers (and LLMs) love

Speech-to-Text Models

Text-to-Speech Models

​Speech-to-Text Models Overview

​Model Comparison

Zipformer/Transducer

Paraformer

Whisper

NeMo CTC

Other Models

​Quick Comparison Table

​Choosing a Model

​For Real-Time Recognition (Streaming)

​For Batch/Offline Transcription

​By Language Support

​By Device Constraints

​Model Detection

​Performance Tips

​Use Quantized Models

​Adjust Thread Count

​Use Execution Providers

​Download Links

​Next Steps

Model Setup Guide

STT API Reference

Streaming STT

Hotwords

Build docs developers (and LLMs) love

Speech-to-Text Models Overview

Model Comparison

Quick Comparison Table

Choosing a Model

For Real-Time Recognition (Streaming)

For Batch/Offline Transcription

By Language Support

By Device Constraints

Model Detection

Performance Tips

Use Quantized Models

Adjust Thread Count

Use Execution Providers

Download Links

Next Steps