STT Configuration

STTInitializeOptions

Configuration options for creating offline STT engine with createSTT().

modelPath

ModelPathConfig

required

Model directory path configuration.

{ type: 'asset', path: 'models/whisper-tiny' }
{ type: 'file', path: '/sdcard/models/whisper' }
{ type: 'auto', path: 'whisper-tiny' }

modelType

STTModelType

Explicit model type. Use 'auto' for automatic detection (default).Options: 'transducer', 'nemo_transducer', 'paraformer', 'nemo_ctc', 'wenet_ctc', 'sense_voice', 'zipformer_ctc', 'ctc', 'whisper', 'funasr_nano', 'fire_red_asr', 'moonshine', 'dolphin', 'canary', 'omnilingual', 'medasr', 'telespeech_ctc', 'auto'

preferInt8

boolean

Model quantization preference.

true: Prefer int8 quantized models (model.int8.onnx) - smaller, faster
false: Prefer regular models (model.onnx) - higher accuracy
undefined: Try int8 first, fall back to regular (default)

debug

boolean

default:false

Enable debug logging in native layer and sherpa-onnx. Emits verbose logs for config dumps, file checks, and init/transcribe flow.

numThreads

number

default:1

Number of threads for inference.

provider

string

Execution provider (e.g. "cpu").

dither

number

default:0

Dither value for feature extraction.

Hotwords (Contextual Biasing)

Hotwords are only supported for transducer and nemo_transducer model types. Use sttSupportsHotwords() to check.

hotwordsFile

string

Path to hotwords file for keyword boosting.

hotwordsFile: '/path/to/hotwords.txt'

hotwordsScore

number

Hotwords score/weight. Higher values increase bias towards hotwords.

modelingUnit

'cjkchar' | 'bpe' | 'cjkchar+bpe'

Modeling unit for hotwords tokenization. Required when using hotwords with transducer/nemo_transducer.

'bpe': English models (e.g. zipformer)
'cjkchar': Chinese models (e.g. conformer)
'cjkchar+bpe': Bilingual zh-en models

bpeVocab

string

Path to BPE vocabulary file. Required when modelingUnit is 'bpe' or 'cjkchar+bpe'. Must be sentencepiece .vocab export, not the hotwords file.

Inverse Text Normalization

ruleFsts

string

Path to rule FSTs for inverse text normalization.

ruleFars

string

Path to rule FARs for inverse text normalization.

Model-Specific Options

modelOptions

SttModelOptions

Model-specific configuration. Only options for the loaded model type are applied.See Model-Specific Options below.

Example

const stt = await createSTT({
  modelPath: { type: 'asset', path: 'models/zipformer-en' },
  modelType: 'transducer',
  preferInt8: true,
  numThreads: 2,
  hotwordsFile: '/path/to/hotwords.txt',
  hotwordsScore: 2.0,
  modelingUnit: 'bpe',
  bpeVocab: '/path/to/bpe.vocab',
  debug: false
});

SttRuntimeConfig

Runtime configuration for offline STT. Update via SttEngine.setConfig() without recreating the engine.

decodingMethod

string

Decoding method: "greedy_search" or "modified_beam_search".

maxActivePaths

number

Max active paths for beam search.

hotwordsFile

string

Path to hotwords file. Can be updated at runtime.

hotwordsScore

number

Hotwords score/weight.

blankPenalty

number

Blank penalty for CTC models.

ruleFsts

string

Path to rule FSTs.

ruleFars

string

Path to rule FARs.

Example

await stt.setConfig({
  decodingMethod: 'modified_beam_search',
  maxActivePaths: 8,
  hotwordsScore: 2.5
});

StreamingSttInitOptions

Configuration options for creating streaming STT engine with createStreamingSTT().

modelPath

ModelPathConfig

required

Model directory path configuration.

modelType

OnlineSTTModelType | 'auto'

required

Online model type or 'auto' to detect from model directory.Supported types: 'transducer', 'paraformer', 'zipformer2_ctc', 'nemo_ctc', 'tone_ctc'

enableEndpoint

boolean

default:true

Enable endpoint (end of utterance) detection.

endpointConfig

EndpointConfig

Endpoint detection rules. See EndpointConfig.

decodingMethod

'greedy_search' | 'modified_beam_search'

default:"greedy_search"

Decoding method.

maxActivePaths

number

default:4

Max active paths for beam search.

hotwordsFile

string

Path to hotwords file (transducer/nemo_transducer only).

hotwordsScore

number

Hotwords score.

numThreads

number

default:1

Number of inference threads.

provider

string

Execution provider (e.g. "cpu").

ruleFsts

string

Path(s) to rule FSTs for ITN.

ruleFars

string

Path(s) to rule FARs for ITN.

blankPenalty

number

Blank penalty for CTC models.

debug

boolean

default:false

Enable debug logging.

enableInputNormalization

boolean

default:true

Enable adaptive input normalization for audio chunks in processAudioChunk().When true, input is scaled so peak is ~0.8 to handle varying device levels (e.g. quiet mics on iOS). Set to false if audio is already in expected range [-1, 1].

Example

const engine = await createStreamingSTT({
  modelPath: { type: 'asset', path: 'models/streaming-zipformer' },
  modelType: 'transducer',
  enableEndpoint: true,
  decodingMethod: 'greedy_search',
  enableInputNormalization: true
});

EndpointConfig

Endpoint detection configuration for streaming STT. Three rules evaluated in order; first match determines end of utterance.

rule1

EndpointRule

Rule 1: e.g. 2.4s trailing silence, no speech required.

rule2

EndpointRule

Rule 2: e.g. 1.4s trailing silence, speech required.

rule3

EndpointRule

Rule 3: e.g. max utterance length 20s.

EndpointRule

mustContainNonSilence

boolean

required

If true, rule only matches when segment contains non-silence.

minTrailingSilence

number

required

Minimum trailing silence in seconds.

minUtteranceLength

number

required

Minimum utterance length in seconds (acts as max length cap).

Example

endpointConfig: {
  rule1: {
    mustContainNonSilence: false,
    minTrailingSilence: 2.4,
    minUtteranceLength: 0.0
  },
  rule2: {
    mustContainNonSilence: true,
    minTrailingSilence: 1.4,
    minUtteranceLength: 0.0
  },
  rule3: {
    mustContainNonSilence: false,
    minTrailingSilence: 0.0,
    minUtteranceLength: 20.0
  }
}

Model-Specific Options

SttModelOptions

Container for model-specific configuration. Only options for the loaded model type are applied.

whisper

SttWhisperModelOptions

Options for Whisper models.

senseVoice

SttSenseVoiceModelOptions

Options for SenseVoice models.

canary

SttCanaryModelOptions

Options for Canary models.

funasrNano

SttFunAsrNanoModelOptions

Options for FunASR Nano models.

SttWhisperModelOptions

Applied only when modelType is 'whisper'.

language

string

default:"en"

Language code (e.g. "en", "de", "zh"). Used with multilingual models.

task

'transcribe' | 'translate'

default:"transcribe"

Task mode. With "translate", result text is always in English.

tailPaddings

number

default:1000

Padding at end of samples. Kotlin default: 1000; C++ default: -1.

enableTokenTimestamps

boolean

Enable token-level timestamps. Android only; ignored on iOS.

enableSegmentTimestamps

boolean

Enable segment-level timestamps. Android only; ignored on iOS.

Example:

modelOptions: {
  whisper: {
    language: 'de',
    task: 'transcribe',
    tailPaddings: 1000
  }
}

SttSenseVoiceModelOptions

Applied only when modelType is 'sense_voice'.

language

string

Language hint.

useItn

boolean

default:true

Inverse text normalization. Default: true (Kotlin), false (C++).

Example:

modelOptions: {
  senseVoice: {
    language: 'zh',
    useItn: true
  }
}

SttCanaryModelOptions

Applied only when modelType is 'canary'.

srcLang

string

default:"en"

Source language code.

tgtLang

string

default:"en"

Target language code.

usePnc

boolean

default:true

Use punctuation.

Example:

modelOptions: {
  canary: {
    srcLang: 'de',
    tgtLang: 'en',
    usePnc: true
  }
}

SttFunAsrNanoModelOptions

Applied only when modelType is 'funasr_nano'.

systemPrompt

string

default:"You are a helpful assistant."

System prompt.

userPrompt

string

default:"语音转写："

User prompt prefix.

maxNewTokens

number

default:512

Maximum new tokens.

temperature

number

Sampling temperature.

topP

number

Top-p sampling.

seed

number

default:42

Random seed.

language

string

Language hint.

itn

boolean

default:true

Inverse text normalization.

hotwords

string

Hotwords string.

Example:

modelOptions: {
  funasrNano: {
    systemPrompt: 'You are a medical transcription assistant.',
    maxNewTokens: 256,
    temperature: 0.7,
    language: 'zh',
    itn: true
  }
}

Core API

Speech-to-Text

Text-to-Speech

Audio Processing

Utilities

STTInitializeOptions

Hotwords (Contextual Biasing)

Inverse Text Normalization

Model-Specific Options

Example

SttRuntimeConfig

Example

StreamingSttInitOptions

Example

EndpointConfig

EndpointRule

Example

Model-Specific Options

SttModelOptions

SttWhisperModelOptions

SttSenseVoiceModelOptions

SttCanaryModelOptions

SttFunAsrNanoModelOptions

Build docs developers (and LLMs) love

Core API

Speech-to-Text

Text-to-Speech

Audio Processing

Utilities

​STTInitializeOptions

​Hotwords (Contextual Biasing)

​Inverse Text Normalization

​Model-Specific Options

​Example

​SttRuntimeConfig

​Example

​StreamingSttInitOptions

​Example

​EndpointConfig

​EndpointRule

​Example

​Model-Specific Options

​SttModelOptions

​SttWhisperModelOptions

​SttSenseVoiceModelOptions

​SttCanaryModelOptions

​SttFunAsrNanoModelOptions

Build docs developers (and LLMs) love

STTInitializeOptions

Hotwords (Contextual Biasing)

Inverse Text Normalization

Model-Specific Options

Example

SttRuntimeConfig

Example

StreamingSttInitOptions

Example

EndpointConfig

EndpointRule

Example

Model-Specific Options

SttModelOptions

SttWhisperModelOptions

SttSenseVoiceModelOptions

SttCanaryModelOptions

SttFunAsrNanoModelOptions