Skip to main content

NeMo CTC Models

NeMo CTC models are developed by NVIDIA and provide excellent performance for English speech recognition. They use Connectionist Temporal Classification (CTC) for fast, streaming-capable recognition.

Model Architecture

NeMo CTC models use a simple, efficient architecture:
  • Model (model.onnx or model.int8.onnx) – Single neural network
  • Tokens (tokens.txt) – Token vocabulary
CTC models are faster than encoder-decoder models because they don’t require autoregressive decoding.

When to Use

English Streaming

Real-time English transcription with low latency

Live Captions

English subtitles for videos or meetings

Fast Recognition

Quick batch transcription of English audio

Voice Assistants

English voice interfaces and commands

Supported Languages

NeMo CTC models are primarily designed for:
  • English (US, UK, and other variants)
  • Some multilingual variants available (check download page)
For other languages, consider Whisper, Paraformer (Chinese), or multilingual transducer models.

Performance Characteristics

AspectRatingNotes
Streaming✅ ExcellentNative streaming support with low latency
Accuracy⭐⭐⭐⭐⭐Very high accuracy for English
Speed⭐⭐⭐⭐⭐Fast CTC decoding
Memory⭐⭐⭐⭐⭐Low memory footprint
Model SizeSmall-MediumTypically 50-150 MB

NeMo CTC Models

Browse and download pretrained NeMo CTC models

Configuration Example

Offline Transcription

import { createSTT } from 'react-native-sherpa-onnx/stt';

const stt = await createSTT({
  modelPath: {
    type: 'asset',
    path: 'models/sherpa-onnx-nemo-ctc-en-citrinet-512'
  },
  modelType: 'nemo_ctc', // or 'auto'
  preferInt8: true,
  numThreads: 2,
});

const result = await stt.transcribeFile('/path/to/audio.wav');
console.log('Transcription:', result.text);

await stt.destroy();

Streaming Recognition

import { createStreamingSTT } from 'react-native-sherpa-onnx/stt';

const engine = await createStreamingSTT({
  modelPath: {
    type: 'asset',
    path: 'models/sherpa-onnx-streaming-nemo-ctc-en'
  },
  modelType: 'nemo_ctc',
  enableEndpoint: true,
  numThreads: 2,
});

const stream = await engine.createStream();

// Feed audio chunks
const samples = getPcmSamplesFromMic(); // float[] in [-1, 1]
const { result, isEndpoint } = await stream.processAudioChunk(samples, 16000);

console.log('Partial result:', result.text);
if (isEndpoint) {
  console.log('Utterance ended');
}

await stream.release();
await engine.destroy();

With Hardware Acceleration

const stt = await createSTT({
  modelPath: { type: 'asset', path: 'models/nemo-ctc-en' },
  modelType: 'nemo_ctc',
  provider: 'nnapi', // Android NNAPI
  numThreads: 4,
});

Model Detection

NeMo CTC models are detected by:
  • Folder name containing nemo or parakeet
  • Presence of model.onnx (or model.int8.onnx) and tokens.txt
Expected files:
  • model.onnx (or model.int8.onnx)
  • tokens.txt

Performance Tips

Use Quantized Models

Int8 quantization provides excellent speedup:
const stt = await createSTT({
  modelPath: { type: 'asset', path: 'models/nemo-ctc-en' },
  preferInt8: true, // Use model.int8.onnx if available
});

Optimize for Real-Time

For streaming applications:
const engine = await createStreamingSTT({
  modelPath: { type: 'asset', path: 'models/nemo-ctc-en' },
  modelType: 'nemo_ctc',
  numThreads: 4,          // More threads for lower latency
  enableEndpoint: true,   // Detect utterance boundaries
  endpointConfig: {
    rule2: {
      mustContainNonSilence: true,
      minTrailingSilence: 0.8, // 800ms of silence = end
      minUtteranceLength: 0,
    }
  },
});

Hardware Acceleration

const stt = await createSTT({
  modelPath: { type: 'asset', path: 'models/nemo-ctc-en' },
  provider: 'nnapi', // Android Neural Networks API
  // provider: 'qnn',    // Qualcomm QNN for Snapdragon devices
});

Streaming Support

Streaming: ✅ YesNeMo CTC models have excellent streaming support. Use createStreamingSTT() for real-time recognition with low latency.

Advantages

  1. Fast: CTC decoding is very fast
  2. Low Latency: Excellent for real-time applications
  3. Streaming: Native streaming support
  4. High Accuracy: NVIDIA-trained models with excellent English accuracy
  5. Low Memory: Efficient single-model architecture
  6. Mobile-Friendly: Small models suitable for mobile deployment

Limitations

  1. English-Focused: Primarily designed for English (limited multilingual support)
  2. No Hotwords: Does not support contextual biasing (use transducer models for hotwords)
  3. Domain-Specific: Best for general English (specialized domains may need fine-tuning)

Parakeet Models

NeMo Parakeet is a family of streaming ASR models:
  • Detected with parakeet in folder name
  • Same nemo_ctc model type
  • Optimized for low latency
const engine = await createStreamingSTT({
  modelPath: { type: 'asset', path: 'models/parakeet-rnnt-en' },
  modelType: 'nemo_ctc', // Parakeet uses same type
});

Use Cases

Voice Commands

English voice control for apps and IoT devices

Live Captions

Real-time English subtitles for videos

Call Transcription

Transcribing English phone calls and meetings

Voice Assistants

English voice interfaces with fast response

Common Issues

  • Verify folder name contains nemo or parakeet
  • Check that model.onnx and tokens.txt are present
  • Ensure sufficient device memory
  • NeMo CTC models are optimized for English
  • Use Whisper or Paraformer for other languages
  • Check if a multilingual variant is available
  • Increase numThreads on multi-core devices
  • Use preferInt8: true for quantized models
  • Enable hardware acceleration with provider
  • Adjust endpoint config for faster utterance detection

Comparison with Other Models

FeatureNeMo CTCTransducerWhisper
SpeedVery FastFastMedium
English AccuracyExcellentExcellentVery Good
StreamingYesYesNo
HotwordsNoYesNo
MultilingualLimitedVariesExcellent
Model SizeSmallMediumLarge
LatencyVery LowLowN/A (offline)

Next Steps

Streaming STT

Learn about real-time recognition

STT API

Detailed API documentation

Model Setup

How to download and bundle models

Execution Providers

Hardware acceleration options

Build docs developers (and LLMs) love