Overview
Hotwords (also called contextual biasing or keyword spotting) allow you to boost recognition accuracy for specific words or phrases. This is particularly useful for domain-specific vocabulary, proper nouns, technical terms, or commands that the base model might not recognize well.Import from:
react-native-sherpa-onnx/sttModel Support
Only specific STT model types support hotwords:Supported Models
transducernemo_transducer
Unsupported Models
whisperparaformersensevoicenemo_ctc- All other types
Checking Model Support
UsesttSupportsHotwords() to check if a model type supports hotwords:
Returns
true only for 'transducer' and 'nemo_transducer'Error Codes
The SDK validates hotword configuration and rejects with specific error codes:| Error Code | When |
|---|---|
HOTWORDS_NOT_SUPPORTED | initializeSTT or setSttConfig is called with a non-empty hotwordsFile and the model type does not support hotwords |
INVALID_HOTWORDS_FILE | The hotwords file is missing, not readable, invalid UTF-8, contains null bytes, has no valid lines, has invalid score syntax, or contains lines with no letter characters |
Hotword File Format
Hotword files must follow this format:Optional score per line
Add a space, colon, and numeric score to adjust boost strengthHigher scores = stronger boosting (default: 1.0 if not specified)
Example Hotwords File
hotwords.txt
Configuration
Initialize with Hotwords
Provide the hotwords file path during STT initialization:Path to the hotwords text file (must be readable and valid)
Global score multiplier applied to all hotwords (default: 1.5)
Update Hotwords at Runtime
You can change hotwords dynamically usingsetConfig():
Automatic Decoding Method Switch
When you provide a non-empty hotwords file, the SDK automatically switches tomodified_beam_search decoding method, because sherpa-onnx only applies hotwords with this method.
You don’t need to manually set
decodingMethod: 'modified_beam_search'. The SDK handles this automatically.maxActivePaths is at least 4 for proper beam search operation:
Modeling Unit and BPE Vocab
For proper hotword tokenization, you may need to specify the modeling unit and (optionally) the BPE vocabulary file.These parameters are only relevant when using hotwords. They tell the tokenizer how to process hotword text to match the model’s training.
modelingUnit
The modeling unit must match how your model was trained:Tokenization method for hotwords
| Value | Description | Typical Models |
|---|---|---|
'bpe' | Byte-pair encoding | English transducers (zipformer-en, LibriSpeech models). Model folder often contains bpe.vocab or bpe.model |
'cjkchar' | Chinese character-based | Chinese transducers (conformer-zh, wenetspeech, multi-dataset zh). No BPE; tokens are characters |
'cjkchar+bpe' | Bilingual (Chinese + English) | Bilingual models (bilingual-zh-en, streaming zipformer bilingual). Model folder often contains bpe.vocab |
bpeVocab
Path to the BPE vocabulary file (sentencepiece
bpe.vocab)modelingUnitis'bpe'or'cjkchar+bpe'- The model directory doesn’t contain an auto-detectable
bpe.vocabfile
If the model directory contains
bpe.vocab, it’s detected automatically and used when bpeVocab is not provided.Configuration Examples
English Transducer with BPE
Chinese Transducer (Character-based)
Bilingual Transducer (Chinese + English)
Use Cases
Technical Terms
Boost recognition of domain-specific jargon, API names, or technical vocabulary that the base model might misrecognize.
Proper Nouns
Improve accuracy for company names, product names, or person names.
Voice Commands
Enhance command recognition for voice control interfaces.
Medical Terms
Boost medical vocabulary for healthcare applications.
Best Practices
Check Model Support First
Check Model Support First
Always use
sttSupportsHotwords() to verify model support before showing hotwords configuration UI.Use Appropriate Scores
Use Appropriate Scores
- Default (1.0): Good starting point for most words
- High (1.5-2.5): Use for critical commands or frequently misrecognized terms
- Very high (2.5+): May cause over-recognition or false positives
Match Model Training
Match Model Training
Always set
modelingUnit to match how your model was trained:- Check model documentation or README
- Look for
bpe.vocaborbpe.modelfiles in the model directory - If unsure,
'bpe'is most common for English models
Keep Files Small
Keep Files Small
Large hotword files can impact performance. Aim for:
- 50-200 entries: Optimal range for most use cases
- 500+ entries: May slow down recognition
- Consider creating context-specific files instead of one giant file
Validate File Format
Validate File Format
Ensure hotwords files:
- Are valid UTF-8
- Have at least one letter per line
- Use correct score syntax (
:1.5) - Don’t contain SRT timestamps or numeric-only lines
Test Different Contexts
Test Different Contexts
Create multiple hotword files for different contexts and switch between them:
Troubleshooting
HOTWORDS_NOT_SUPPORTED error
HOTWORDS_NOT_SUPPORTED error
Cause: You’re using a model type that doesn’t support hotwords (e.g., Whisper, Paraformer).Solution:
- Use a
transducerornemo_transducermodel instead - Remove the
hotwordsFileparameter if you must use an unsupported model - Check model type with
sttSupportsHotwords(modelType)before enabling hotwords
INVALID_HOTWORDS_FILE error
INVALID_HOTWORDS_FILE error
Cause: The hotwords file has formatting issues.Solution:
- Ensure file exists and is readable
- Check for invalid UTF-8 or null bytes
- Verify each line has at least one letter character
- Check score syntax: must be
:1.5(space, colon, number) - Remove SRT timestamps or numeric-only lines
Hotwords not improving recognition
Hotwords not improving recognition
Possible causes:
- Score too low (increase to 1.5-2.0)
- Wrong
modelingUnitfor your model - Missing or incorrect
bpeVocabpath - Hotword phrase doesn’t match audio pronunciation
- Increase hotword scores gradually
- Verify
modelingUnitmatches model training - Check for
bpe.vocabin model directory - Test with simpler single-word hotwords first
Too many false positives
Too many false positives
Cause: Hotword scores are too high.Solution:
- Reduce scores to 1.0-1.5 range
- Remove overly generic terms
- Use more specific multi-word phrases instead of single words
Related Functions
Constant array containing all model types that support hotwords