createSTT()

Create an STT engine instance for batch/offline transcription. This is ideal for transcribing complete audio files or recorded audio samples. For real-time streaming recognition, use createStreamingSTT() instead.

function createSTT(
  options: STTInitializeOptions | ModelPathConfig
): Promise<SttEngine>

Parameters

options

STTInitializeOptions | ModelPathConfig

required

Initialization options or a model path configuration.

Show STTInitializeOptions properties

modelPath

ModelPathConfig

required

Model directory path configuration. Use assetModelPath(), fileModelPath(), or autoModelPath().

modelType

STTModelType

default:"auto"

Explicit model type. Options: 'transducer', 'paraformer', 'whisper', 'nemo_ctc', 'zipformer_ctc', 'sense_voice', 'funasr_nano', 'fire_red_asr', 'moonshine', 'dolphin', 'canary', 'omnilingual', 'medasr', 'telespeech_ctc', or 'auto' for automatic detection.

preferInt8

boolean

Model quantization preference:

true: Prefer int8 quantized models (smaller, faster)
false: Prefer regular models (higher accuracy)
undefined: Try int8 first, fall back to regular (default)

debug

boolean

default:false

Enable debug logging in native layer and sherpa-onnx.

hotwordsFile

string

Path to hotwords file for keyword boosting (transducer/nemo_transducer only).

hotwordsScore

number

Hotwords score/weight for contextual biasing.

modelingUnit

'cjkchar' | 'bpe' | 'cjkchar+bpe'

Modeling unit for hotwords tokenization (transducer/nemo_transducer only). Must match model training:

'bpe': English zipformer models
'cjkchar': Chinese conformer models
'cjkchar+bpe': Bilingual zh-en models

bpeVocab

string

Path to BPE vocabulary file (.vocab export from sentencepiece). Required when modelingUnit is 'bpe' or 'cjkchar+bpe'.

numThreads

number

default:1

Number of threads for inference.

provider

string

Execution provider (e.g., 'cpu', 'coreml', 'xnnpack', 'nnapi', 'qnn'). Use getCoreMlSupport(), etc. to check availability.

ruleFsts

string

Path to rule FSTs for inverse text normalization.

ruleFars

string

Path to rule FARs for inverse text normalization.

dither

number

default:0

Dither value for feature extraction.

modelOptions

SttModelOptions

Model-specific options (whisper, senseVoice, canary, funasrNano). Only options for the loaded model type are applied.

Returns

Promise<SttEngine>

SttEngine

A promise that resolves to an STT engine instance.

Show SttEngine interface

instanceId

string

Unique instance identifier.

transcribeFile

(filePath: string) => Promise<SttRecognitionResult>

Transcribe an audio file. Supports WAV, FLAC, MP3, etc.

transcribeSamples

(samples: number[], sampleRate: number) => Promise<SttRecognitionResult>

Transcribe PCM audio samples (float array in [-1, 1]).

setConfig

(config: SttRuntimeConfig) => Promise<void>

Update runtime configuration (decoding method, hotwords, etc.).

destroy

() => Promise<void>

Release native resources. Must be called when done.

Examples

Basic Usage

import { createSTT, assetModelPath } from 'react-native-sherpa-onnx/stt';

// Create STT engine with asset model
const stt = await createSTT({
  modelPath: assetModelPath('models/whisper-tiny-en'),
});

// Transcribe an audio file
const result = await stt.transcribeFile('/path/to/audio.wav');
console.log('Transcription:', result.text);

// Clean up
await stt.destroy();

With Auto-Detection

import { createSTT, autoModelPath } from 'react-native-sherpa-onnx/stt';

const stt = await createSTT({
  modelPath: autoModelPath('models/sherpa-onnx-whisper-tiny'),
  modelType: 'auto', // Automatically detect model type
});

const result = await stt.transcribeFile('recording.wav');
console.log(result.text);
await stt.destroy();

With Downloaded Model

import { createSTT, fileModelPath } from 'react-native-sherpa-onnx/stt';
import { getLocalModelPathByCategory, ModelCategory } from 'react-native-sherpa-onnx/download';

// Get path to downloaded model
const modelPath = await getLocalModelPathByCategory(
  ModelCategory.Stt,
  'sherpa-onnx-whisper-tiny-en'
);

if (modelPath) {
  const stt = await createSTT({
    modelPath: fileModelPath(modelPath),
  });
  
  const result = await stt.transcribeFile('audio.wav');
  console.log(result.text);
  await stt.destroy();
}

With Hotwords (Keyword Boosting)

import { createSTT, assetModelPath } from 'react-native-sherpa-onnx/stt';

const stt = await createSTT({
  modelPath: assetModelPath('models/zipformer-transducer-en'),
  modelType: 'transducer',
  hotwordsFile: '/path/to/hotwords.txt', // One keyword per line
  hotwordsScore: 2.0,
  modelingUnit: 'bpe',
  bpeVocab: '/path/to/bpe.vocab',
});

const result = await stt.transcribeFile('audio.wav');
// Result will have higher confidence for keywords in hotwords file
console.log(result.text);
await stt.destroy();

With Whisper Model Options

import { createSTT, assetModelPath } from 'react-native-sherpa-onnx/stt';

const stt = await createSTT({
  modelPath: assetModelPath('models/whisper-base'),
  modelType: 'whisper',
  modelOptions: {
    whisper: {
      language: 'en',
      task: 'transcribe', // or 'translate' for English translation
      tailPaddings: 1000,
    },
  },
});

const result = await stt.transcribeFile('audio.wav');
console.log(result.text);
await stt.destroy();

Transcribe PCM Samples

import { createSTT, assetModelPath } from 'react-native-sherpa-onnx/stt';

const stt = await createSTT({
  modelPath: assetModelPath('models/whisper-tiny'),
});

// Assume samples is Float32Array or number[] of PCM audio in [-1, 1]
const samples: number[] = [...]; // Your audio samples
const sampleRate = 16000;

const result = await stt.transcribeSamples(samples, sampleRate);
console.log('Transcription:', result.text);
console.log('Tokens:', result.tokens);
console.log('Timestamps:', result.timestamps);

await stt.destroy();

With Hardware Acceleration

import { 
  createSTT, 
  assetModelPath 
} from 'react-native-sherpa-onnx/stt';
import { getCoreMlSupport } from 'react-native-sherpa-onnx';

// Check Core ML support on iOS
const coreMLSupport = await getCoreMlSupport();

const stt = await createSTT({
  modelPath: assetModelPath('models/whisper-tiny'),
  provider: coreMLSupport.canInit ? 'coreml' : 'cpu',
  numThreads: 2,
});

const result = await stt.transcribeFile('audio.wav');
console.log(result.text);
await stt.destroy();

detectSttModel()

Detect STT model type without initializing the recognizer.

function detectSttModel(
  modelPath: ModelPathConfig,
  options?: { preferInt8?: boolean; modelType?: STTModelType }
): Promise<{
  success: boolean;
  detectedModels: Array<{ type: string; modelDir: string }>;
  modelType?: string;
}>

Example

import { detectSttModel, assetModelPath } from 'react-native-sherpa-onnx/stt';

const result = await detectSttModel(
  assetModelPath('models/sherpa-onnx-whisper-tiny-en')
);

if (result.success) {
  console.log('Detected type:', result.modelType);
  console.log('Models found:', result.detectedModels);
}

Core API

Speech-to-Text

Text-to-Speech

Audio & Models

Parameters

Returns

Examples

Basic Usage

With Auto-Detection

With Downloaded Model

With Hotwords (Keyword Boosting)

With Whisper Model Options

Transcribe PCM Samples

With Hardware Acceleration

detectSttModel()

Example

See Also

Build docs developers (and LLMs) love

Core API

Speech-to-Text

Text-to-Speech

Audio & Models

​Parameters

​Returns

​Examples

​Basic Usage

​With Auto-Detection

​With Downloaded Model

​With Hotwords (Keyword Boosting)

​With Whisper Model Options

​Transcribe PCM Samples

​With Hardware Acceleration

​Related Functions

​detectSttModel()

​Example

​See Also

Build docs developers (and LLMs) love

Parameters

Returns

Examples

Basic Usage

With Auto-Detection

With Downloaded Model

With Hotwords (Keyword Boosting)

With Whisper Model Options

Transcribe PCM Samples

With Hardware Acceleration

Related Functions

detectSttModel()

Example

See Also