Skip to main content

STTInitializeOptions

Configuration options for creating offline STT engine with createSTT().
modelPath
ModelPathConfig
required
Model directory path configuration.
{ type: 'asset', path: 'models/whisper-tiny' }
{ type: 'file', path: '/sdcard/models/whisper' }
{ type: 'auto', path: 'whisper-tiny' }
modelType
STTModelType
Explicit model type. Use 'auto' for automatic detection (default).Options: 'transducer', 'nemo_transducer', 'paraformer', 'nemo_ctc', 'wenet_ctc', 'sense_voice', 'zipformer_ctc', 'ctc', 'whisper', 'funasr_nano', 'fire_red_asr', 'moonshine', 'dolphin', 'canary', 'omnilingual', 'medasr', 'telespeech_ctc', 'auto'
preferInt8
boolean
Model quantization preference.
  • true: Prefer int8 quantized models (model.int8.onnx) - smaller, faster
  • false: Prefer regular models (model.onnx) - higher accuracy
  • undefined: Try int8 first, fall back to regular (default)
debug
boolean
default:false
Enable debug logging in native layer and sherpa-onnx. Emits verbose logs for config dumps, file checks, and init/transcribe flow.
numThreads
number
default:1
Number of threads for inference.
provider
string
Execution provider (e.g. "cpu").
dither
number
default:0
Dither value for feature extraction.

Hotwords (Contextual Biasing)

Hotwords are only supported for transducer and nemo_transducer model types. Use sttSupportsHotwords() to check.
hotwordsFile
string
Path to hotwords file for keyword boosting.
hotwordsFile: '/path/to/hotwords.txt'
hotwordsScore
number
Hotwords score/weight. Higher values increase bias towards hotwords.
modelingUnit
'cjkchar' | 'bpe' | 'cjkchar+bpe'
Modeling unit for hotwords tokenization. Required when using hotwords with transducer/nemo_transducer.
  • 'bpe': English models (e.g. zipformer)
  • 'cjkchar': Chinese models (e.g. conformer)
  • 'cjkchar+bpe': Bilingual zh-en models
bpeVocab
string
Path to BPE vocabulary file. Required when modelingUnit is 'bpe' or 'cjkchar+bpe'. Must be sentencepiece .vocab export, not the hotwords file.

Inverse Text Normalization

ruleFsts
string
Path to rule FSTs for inverse text normalization.
ruleFars
string
Path to rule FARs for inverse text normalization.

Model-Specific Options

modelOptions
SttModelOptions
Model-specific configuration. Only options for the loaded model type are applied.See Model-Specific Options below.

Example

const stt = await createSTT({
  modelPath: { type: 'asset', path: 'models/zipformer-en' },
  modelType: 'transducer',
  preferInt8: true,
  numThreads: 2,
  hotwordsFile: '/path/to/hotwords.txt',
  hotwordsScore: 2.0,
  modelingUnit: 'bpe',
  bpeVocab: '/path/to/bpe.vocab',
  debug: false
});

SttRuntimeConfig

Runtime configuration for offline STT. Update via SttEngine.setConfig() without recreating the engine.
decodingMethod
string
Decoding method: "greedy_search" or "modified_beam_search".
maxActivePaths
number
Max active paths for beam search.
hotwordsFile
string
Path to hotwords file. Can be updated at runtime.
hotwordsScore
number
Hotwords score/weight.
blankPenalty
number
Blank penalty for CTC models.
ruleFsts
string
Path to rule FSTs.
ruleFars
string
Path to rule FARs.

Example

await stt.setConfig({
  decodingMethod: 'modified_beam_search',
  maxActivePaths: 8,
  hotwordsScore: 2.5
});

StreamingSttInitOptions

Configuration options for creating streaming STT engine with createStreamingSTT().
modelPath
ModelPathConfig
required
Model directory path configuration.
modelType
OnlineSTTModelType | 'auto'
required
Online model type or 'auto' to detect from model directory.Supported types: 'transducer', 'paraformer', 'zipformer2_ctc', 'nemo_ctc', 'tone_ctc'
enableEndpoint
boolean
default:true
Enable endpoint (end of utterance) detection.
endpointConfig
EndpointConfig
Endpoint detection rules. See EndpointConfig.
decodingMethod
'greedy_search' | 'modified_beam_search'
default:"greedy_search"
Decoding method.
maxActivePaths
number
default:4
Max active paths for beam search.
hotwordsFile
string
Path to hotwords file (transducer/nemo_transducer only).
hotwordsScore
number
Hotwords score.
numThreads
number
default:1
Number of inference threads.
provider
string
Execution provider (e.g. "cpu").
ruleFsts
string
Path(s) to rule FSTs for ITN.
ruleFars
string
Path(s) to rule FARs for ITN.
blankPenalty
number
Blank penalty for CTC models.
debug
boolean
default:false
Enable debug logging.
enableInputNormalization
boolean
default:true
Enable adaptive input normalization for audio chunks in processAudioChunk().When true, input is scaled so peak is ~0.8 to handle varying device levels (e.g. quiet mics on iOS). Set to false if audio is already in expected range [-1, 1].

Example

const engine = await createStreamingSTT({
  modelPath: { type: 'asset', path: 'models/streaming-zipformer' },
  modelType: 'transducer',
  enableEndpoint: true,
  decodingMethod: 'greedy_search',
  enableInputNormalization: true
});

EndpointConfig

Endpoint detection configuration for streaming STT. Three rules evaluated in order; first match determines end of utterance.
rule1
EndpointRule
Rule 1: e.g. 2.4s trailing silence, no speech required.
rule2
EndpointRule
Rule 2: e.g. 1.4s trailing silence, speech required.
rule3
EndpointRule
Rule 3: e.g. max utterance length 20s.

EndpointRule

mustContainNonSilence
boolean
required
If true, rule only matches when segment contains non-silence.
minTrailingSilence
number
required
Minimum trailing silence in seconds.
minUtteranceLength
number
required
Minimum utterance length in seconds (acts as max length cap).

Example

endpointConfig: {
  rule1: {
    mustContainNonSilence: false,
    minTrailingSilence: 2.4,
    minUtteranceLength: 0.0
  },
  rule2: {
    mustContainNonSilence: true,
    minTrailingSilence: 1.4,
    minUtteranceLength: 0.0
  },
  rule3: {
    mustContainNonSilence: false,
    minTrailingSilence: 0.0,
    minUtteranceLength: 20.0
  }
}

Model-Specific Options

SttModelOptions

Container for model-specific configuration. Only options for the loaded model type are applied.
whisper
SttWhisperModelOptions
Options for Whisper models.
senseVoice
SttSenseVoiceModelOptions
Options for SenseVoice models.
canary
SttCanaryModelOptions
Options for Canary models.
funasrNano
SttFunAsrNanoModelOptions
Options for FunASR Nano models.

SttWhisperModelOptions

Applied only when modelType is 'whisper'.
language
string
default:"en"
Language code (e.g. "en", "de", "zh"). Used with multilingual models.
task
'transcribe' | 'translate'
default:"transcribe"
Task mode. With "translate", result text is always in English.
tailPaddings
number
default:1000
Padding at end of samples. Kotlin default: 1000; C++ default: -1.
enableTokenTimestamps
boolean
Enable token-level timestamps. Android only; ignored on iOS.
enableSegmentTimestamps
boolean
Enable segment-level timestamps. Android only; ignored on iOS.
Example:
modelOptions: {
  whisper: {
    language: 'de',
    task: 'transcribe',
    tailPaddings: 1000
  }
}

SttSenseVoiceModelOptions

Applied only when modelType is 'sense_voice'.
language
string
Language hint.
useItn
boolean
default:true
Inverse text normalization. Default: true (Kotlin), false (C++).
Example:
modelOptions: {
  senseVoice: {
    language: 'zh',
    useItn: true
  }
}

SttCanaryModelOptions

Applied only when modelType is 'canary'.
srcLang
string
default:"en"
Source language code.
tgtLang
string
default:"en"
Target language code.
usePnc
boolean
default:true
Use punctuation.
Example:
modelOptions: {
  canary: {
    srcLang: 'de',
    tgtLang: 'en',
    usePnc: true
  }
}

SttFunAsrNanoModelOptions

Applied only when modelType is 'funasr_nano'.
systemPrompt
string
default:"You are a helpful assistant."
System prompt.
userPrompt
string
default:"语音转写:"
User prompt prefix.
maxNewTokens
number
default:512
Maximum new tokens.
temperature
number
Sampling temperature.
topP
number
Top-p sampling.
seed
number
default:42
Random seed.
language
string
Language hint.
itn
boolean
default:true
Inverse text normalization.
hotwords
string
Hotwords string.
Example:
modelOptions: {
  funasrNano: {
    systemPrompt: 'You are a medical transcription assistant.',
    maxNewTokens: 256,
    temperature: 0.7,
    language: 'zh',
    itn: true
  }
}

Build docs developers (and LLMs) love