Paraformer Models

Paraformer is a non-autoregressive speech recognition model that offers excellent speed and accuracy for both offline and streaming use cases.

Model Architecture

Paraformer uses a single-model architecture:

Model (model.onnx or model.int8.onnx) – Single neural network
Tokens (tokens.txt) – Token vocabulary

Unlike transducer models, Paraformer doesn’t require separate encoder, decoder, and joiner components, making it simpler to deploy.

When to Use

Fast Batch Processing

Excellent for transcribing multiple audio files quickly

Chinese Speech

Outstanding accuracy for Mandarin Chinese

Streaming Recognition

Supports streaming mode for real-time transcription

Resource-Constrained Devices

Single-model architecture uses less memory

Supported Languages

Paraformer models are primarily available for:

Chinese (Mandarin) – Excellent accuracy, widely used
English – Some bilingual variants available
Chinese + English – Bilingual models

Paraformer models excel at Chinese speech recognition and are widely adopted in Chinese applications.

Performance Characteristics

Aspect	Rating	Notes
Streaming	✅ Supported	Streaming-capable with good latency
Accuracy	⭐⭐⭐⭐⭐	Very high accuracy, especially for Chinese
Speed	⭐⭐⭐⭐⭐	Fast non-autoregressive inference
Memory	⭐⭐⭐⭐⭐	Low memory usage (single model)
Model Size	Small-Medium	Typically 50-200 MB depending on variant

Download Links

Paraformer Models

Browse and download pretrained Paraformer models

Configuration Example

Offline Transcription

import { createSTT } from 'react-native-sherpa-onnx/stt';

const stt = await createSTT({
  modelPath: {
    type: 'asset',
    path: 'models/sherpa-onnx-paraformer-zh-2023-03-28'
  },
  modelType: 'paraformer', // or 'auto'
  preferInt8: true,
  numThreads: 2,
});

const result = await stt.transcribeFile('/path/to/audio.wav');
console.log('Transcription:', result.text);
console.log('Tokens:', result.tokens);

await stt.destroy();

Transcribe from Samples

const samples = getPcmSamples(); // float[] in [-1, 1]
const result = await stt.transcribeSamples(samples, 16000);
console.log('Result:', result.text);

Streaming Recognition

import { createStreamingSTT } from 'react-native-sherpa-onnx/stt';

const engine = await createStreamingSTT({
  modelPath: {
    type: 'asset',
    path: 'models/sherpa-onnx-streaming-paraformer-zh'
  },
  modelType: 'paraformer',
  enableEndpoint: true,
});

const stream = await engine.createStream();

// Process audio chunks
const { result, isEndpoint } = await stream.processAudioChunk(samples, 16000);
console.log('Partial:', result.text);

await stream.release();
await engine.destroy();

Model Detection

Paraformer models are detected automatically by the presence of model.onnx (or model.int8.onnx) and tokens.txt. No folder name pattern is required. Expected files:

model.onnx (or model.int8.onnx)
tokens.txt

Performance Tips

Use Quantized Models

Int8 quantized Paraformer models offer excellent speed:

const stt = await createSTT({
  modelPath: { type: 'asset', path: 'models/paraformer-zh' },
  preferInt8: true, // Automatically use model.int8.onnx if available
});

Optimize for Batch Processing

For transcribing multiple files:

const stt = await createSTT({
  modelPath: { type: 'asset', path: 'models/paraformer-zh' },
  numThreads: 4, // Use more threads for faster batch processing
});

const files = ['audio1.wav', 'audio2.wav', 'audio3.wav'];
const results = await Promise.all(
  files.map(file => stt.transcribeFile(file))
);

Hardware Acceleration

const stt = await createSTT({
  modelPath: { type: 'asset', path: 'models/paraformer-zh' },
  provider: 'nnapi', // Android NNAPI
  // provider: 'xnnpack', // XNNPACK for broader compatibility
});

Streaming Support

Streaming: ✅ YesParaformer supports streaming recognition. Use createStreamingSTT() with modelType: 'paraformer' for real-time transcription.

Advantages

Fast Inference: Non-autoregressive decoding is faster than autoregressive models
Simple Deployment: Single model file, no separate encoder/decoder/joiner
Excellent for Chinese: State-of-the-art accuracy for Mandarin
Low Memory: Single-model architecture uses less RAM
Streaming Capable: Supports real-time recognition

Limitations

Language Coverage: Primarily Chinese-focused, fewer English-only variants
No Hotwords: Does not support contextual biasing (use transducer models for hotwords)
Domain-Specific: Best suited for general Chinese speech (not specialized domains without fine-tuning)

Use Cases

Chinese Transcription

Transcribing Chinese audio files, podcasts, or videos

Real-Time Subtitles

Live Chinese captions for streaming or conferencing

Voice Input

Chinese voice input for apps and forms

Batch Processing

Transcribing large collections of Chinese audio

Common Issues

Model not loading

Verify model.onnx and tokens.txt are present
Check that the model path is correct
Ensure sufficient device memory

Poor accuracy on non-Chinese audio

Paraformer models are optimized for Chinese
Use Whisper or transducer models for other languages
Check if you’re using a bilingual (Chinese+English) variant

Slow performance

Enable preferInt8: true for quantized models
Increase numThreads on multi-core devices
Use hardware acceleration (provider: 'nnapi')

Comparison with Other Models

Feature	Paraformer	Transducer	Whisper
Speed	Very Fast	Fast	Medium
Chinese Accuracy	Excellent	Good	Good
Streaming	Yes	Yes	No
Hotwords	No	Yes	No
Multilingual	Limited	Varies	Excellent
Model Size	Small	Medium	Large

Next Steps

STT API

Detailed API documentation

Streaming STT

Real-time recognition guide

Model Setup

How to download and bundle models

Execution Providers

Hardware acceleration options

Speech-to-Text Models

Text-to-Speech Models

Paraformer Models

Paraformer Models

Model Architecture

When to Use

Fast Batch Processing

Chinese Speech

Streaming Recognition

Resource-Constrained Devices

Supported Languages

Performance Characteristics

Download Links

Paraformer Models

Configuration Example

Offline Transcription

Transcribe from Samples

Streaming Recognition

Model Detection

Performance Tips

Use Quantized Models

Optimize for Batch Processing

Hardware Acceleration

Streaming Support

Advantages

Limitations

Use Cases

Chinese Transcription

Real-Time Subtitles

Voice Input

Batch Processing

Common Issues

Comparison with Other Models

Next Steps

STT API

Streaming STT

Model Setup

Execution Providers

Build docs developers (and LLMs) love

Speech-to-Text Models

Text-to-Speech Models

​Paraformer Models

​Model Architecture

​When to Use

Fast Batch Processing

Chinese Speech

Streaming Recognition

Resource-Constrained Devices

​Supported Languages

​Performance Characteristics

​Download Links

Paraformer Models

​Configuration Example

​Offline Transcription

​Transcribe from Samples

​Streaming Recognition

​Model Detection

​Performance Tips

​Use Quantized Models

​Optimize for Batch Processing

​Hardware Acceleration

​Streaming Support

​Advantages

​Limitations

​Use Cases

Chinese Transcription

Real-Time Subtitles

Voice Input

Batch Processing

​Common Issues

​Comparison with Other Models

​Next Steps

STT API

Streaming STT

Model Setup

Execution Providers

Build docs developers (and LLMs) love

Paraformer Models

Model Architecture

When to Use

Supported Languages

Performance Characteristics

Download Links

Configuration Example

Offline Transcription

Transcribe from Samples

Streaming Recognition

Model Detection

Performance Tips

Use Quantized Models

Optimize for Batch Processing

Hardware Acceleration

Streaming Support

Advantages

Limitations

Use Cases

Common Issues

Comparison with Other Models

Next Steps