NeMo CTC Models
NeMo CTC models are developed by NVIDIA and provide excellent performance for English speech recognition. They use Connectionist Temporal Classification (CTC) for fast, streaming-capable recognition.Model Architecture
NeMo CTC models use a simple, efficient architecture:- Model (
model.onnxormodel.int8.onnx) – Single neural network - Tokens (
tokens.txt) – Token vocabulary
When to Use
English Streaming
Real-time English transcription with low latency
Live Captions
English subtitles for videos or meetings
Fast Recognition
Quick batch transcription of English audio
Voice Assistants
English voice interfaces and commands
Supported Languages
NeMo CTC models are primarily designed for:- English (US, UK, and other variants)
- Some multilingual variants available (check download page)
Performance Characteristics
| Aspect | Rating | Notes |
|---|---|---|
| Streaming | ✅ Excellent | Native streaming support with low latency |
| Accuracy | ⭐⭐⭐⭐⭐ | Very high accuracy for English |
| Speed | ⭐⭐⭐⭐⭐ | Fast CTC decoding |
| Memory | ⭐⭐⭐⭐⭐ | Low memory footprint |
| Model Size | Small-Medium | Typically 50-150 MB |
Download Links
NeMo CTC Models
Browse and download pretrained NeMo CTC models
Configuration Example
Offline Transcription
Streaming Recognition
With Hardware Acceleration
Model Detection
NeMo CTC models are detected by:- Folder name containing
nemoorparakeet - Presence of
model.onnx(ormodel.int8.onnx) andtokens.txt
model.onnx(ormodel.int8.onnx)tokens.txt
Performance Tips
Use Quantized Models
Int8 quantization provides excellent speedup:Optimize for Real-Time
For streaming applications:Hardware Acceleration
Streaming Support
Streaming: ✅ YesNeMo CTC models have excellent streaming support. Use
createStreamingSTT() for real-time recognition with low latency.Advantages
- Fast: CTC decoding is very fast
- Low Latency: Excellent for real-time applications
- Streaming: Native streaming support
- High Accuracy: NVIDIA-trained models with excellent English accuracy
- Low Memory: Efficient single-model architecture
- Mobile-Friendly: Small models suitable for mobile deployment
Limitations
- English-Focused: Primarily designed for English (limited multilingual support)
- No Hotwords: Does not support contextual biasing (use transducer models for hotwords)
- Domain-Specific: Best for general English (specialized domains may need fine-tuning)
Parakeet Models
NeMo Parakeet is a family of streaming ASR models:- Detected with
parakeetin folder name - Same
nemo_ctcmodel type - Optimized for low latency
Use Cases
Voice Commands
English voice control for apps and IoT devices
Live Captions
Real-time English subtitles for videos
Call Transcription
Transcribing English phone calls and meetings
Voice Assistants
English voice interfaces with fast response
Common Issues
Model not loading
Model not loading
- Verify folder name contains
nemoorparakeet - Check that
model.onnxandtokens.txtare present - Ensure sufficient device memory
Poor accuracy on non-English audio
Poor accuracy on non-English audio
- NeMo CTC models are optimized for English
- Use Whisper or Paraformer for other languages
- Check if a multilingual variant is available
High latency in streaming
High latency in streaming
- Increase
numThreadson multi-core devices - Use
preferInt8: truefor quantized models - Enable hardware acceleration with
provider - Adjust endpoint config for faster utterance detection
Comparison with Other Models
| Feature | NeMo CTC | Transducer | Whisper |
|---|---|---|---|
| Speed | Very Fast | Fast | Medium |
| English Accuracy | Excellent | Excellent | Very Good |
| Streaming | Yes | Yes | No |
| Hotwords | No | Yes | No |
| Multilingual | Limited | Varies | Excellent |
| Model Size | Small | Medium | Large |
| Latency | Very Low | Low | N/A (offline) |
Next Steps
Streaming STT
Learn about real-time recognition
STT API
Detailed API documentation
Model Setup
How to download and bundle models
Execution Providers
Hardware acceleration options