STTInitializeOptions
Configuration options for creating offline STT engine withcreateSTT().
Model directory path configuration.
Explicit model type. Use
'auto' for automatic detection (default).Options: 'transducer', 'nemo_transducer', 'paraformer', 'nemo_ctc', 'wenet_ctc', 'sense_voice', 'zipformer_ctc', 'ctc', 'whisper', 'funasr_nano', 'fire_red_asr', 'moonshine', 'dolphin', 'canary', 'omnilingual', 'medasr', 'telespeech_ctc', 'auto'Model quantization preference.
true: Prefer int8 quantized models (model.int8.onnx) - smaller, fasterfalse: Prefer regular models (model.onnx) - higher accuracyundefined: Try int8 first, fall back to regular (default)
Enable debug logging in native layer and sherpa-onnx. Emits verbose logs for config dumps, file checks, and init/transcribe flow.
Number of threads for inference.
Execution provider (e.g.
"cpu").Dither value for feature extraction.
Hotwords (Contextual Biasing)
Hotwords are only supported for
transducer and nemo_transducer model types. Use sttSupportsHotwords() to check.Path to hotwords file for keyword boosting.
Hotwords score/weight. Higher values increase bias towards hotwords.
Modeling unit for hotwords tokenization. Required when using hotwords with transducer/nemo_transducer.
'bpe': English models (e.g. zipformer)'cjkchar': Chinese models (e.g. conformer)'cjkchar+bpe': Bilingual zh-en models
Path to BPE vocabulary file. Required when
modelingUnit is 'bpe' or 'cjkchar+bpe'. Must be sentencepiece .vocab export, not the hotwords file.Inverse Text Normalization
Path to rule FSTs for inverse text normalization.
Path to rule FARs for inverse text normalization.
Model-Specific Options
Model-specific configuration. Only options for the loaded model type are applied.See Model-Specific Options below.
Example
SttRuntimeConfig
Runtime configuration for offline STT. Update viaSttEngine.setConfig() without recreating the engine.
Decoding method:
"greedy_search" or "modified_beam_search".Max active paths for beam search.
Path to hotwords file. Can be updated at runtime.
Hotwords score/weight.
Blank penalty for CTC models.
Path to rule FSTs.
Path to rule FARs.
Example
StreamingSttInitOptions
Configuration options for creating streaming STT engine withcreateStreamingSTT().
Model directory path configuration.
Online model type or
'auto' to detect from model directory.Supported types: 'transducer', 'paraformer', 'zipformer2_ctc', 'nemo_ctc', 'tone_ctc'Enable endpoint (end of utterance) detection.
Endpoint detection rules. See EndpointConfig.
Decoding method.
Max active paths for beam search.
Path to hotwords file (transducer/nemo_transducer only).
Hotwords score.
Number of inference threads.
Execution provider (e.g.
"cpu").Path(s) to rule FSTs for ITN.
Path(s) to rule FARs for ITN.
Blank penalty for CTC models.
Enable debug logging.
Enable adaptive input normalization for audio chunks in
processAudioChunk().When true, input is scaled so peak is ~0.8 to handle varying device levels (e.g. quiet mics on iOS). Set to false if audio is already in expected range [-1, 1].Example
EndpointConfig
Endpoint detection configuration for streaming STT. Three rules evaluated in order; first match determines end of utterance.Rule 1: e.g. 2.4s trailing silence, no speech required.
Rule 2: e.g. 1.4s trailing silence, speech required.
Rule 3: e.g. max utterance length 20s.
EndpointRule
If
true, rule only matches when segment contains non-silence.Minimum trailing silence in seconds.
Minimum utterance length in seconds (acts as max length cap).
Example
Model-Specific Options
SttModelOptions
Container for model-specific configuration. Only options for the loaded model type are applied.Options for Whisper models.
Options for SenseVoice models.
Options for Canary models.
Options for FunASR Nano models.
SttWhisperModelOptions
Applied only whenmodelType is 'whisper'.
Language code (e.g.
"en", "de", "zh"). Used with multilingual models.Task mode. With
"translate", result text is always in English.Padding at end of samples. Kotlin default: 1000; C++ default: -1.
Enable token-level timestamps. Android only; ignored on iOS.
Enable segment-level timestamps. Android only; ignored on iOS.
SttSenseVoiceModelOptions
Applied only whenmodelType is 'sense_voice'.
Language hint.
Inverse text normalization. Default:
true (Kotlin), false (C++).SttCanaryModelOptions
Applied only whenmodelType is 'canary'.
Source language code.
Target language code.
Use punctuation.
SttFunAsrNanoModelOptions
Applied only whenmodelType is 'funasr_nano'.
System prompt.
User prompt prefix.
Maximum new tokens.
Sampling temperature.
Top-p sampling.
Random seed.
Language hint.
Inverse text normalization.
Hotwords string.