Skip to main content

createStreamingSTT()

Creates a streaming (online) STT engine for real-time recognition with partial results and endpoint detection. Use this for live transcription from microphones or audio streams.
function createStreamingSTT(
  options: StreamingSttInitOptions
): Promise<StreamingSttEngine>

Parameters

options
StreamingSttInitOptions
required
Streaming STT initialization options. See StreamingSttInitOptions.

Returns

Promise resolving to a StreamingSttEngine instance.

Example

const engine = await createStreamingSTT({
  modelPath: { type: 'asset', path: 'models/streaming-zipformer-en' },
  modelType: 'transducer',
  enableEndpoint: true,
  endpointConfig: {
    rule2: {
      mustContainNonSilence: true,
      minTrailingSilence: 1.4,
      minUtteranceLength: 0.0
    }
  }
});

const stream = await engine.createStream();
await stream.acceptWaveform(samples, 16000);

if (await stream.isReady()) {
  await stream.decode();
  const result = await stream.getResult();
  console.log('Partial:', result.text);
}

await stream.release();
await engine.destroy();

StreamingSttEngine

Streaming STT engine interface returned by createStreamingSTT().

Properties

instanceId
string
Unique identifier for this engine instance.

Methods

createStream()

Create a new recognition stream for this engine. Multiple streams can be active simultaneously.
createStream(hotwords?: string): Promise<SttStream>
hotwords
string
Optional hotwords/keywords string for contextual biasing (transducer/nemo_transducer only).
Returns: Promise resolving to an SttStream instance. Example:
const stream = await engine.createStream('OpenAI /ˌoʊpən eɪ aɪ/');

destroy()

Release native recognizer and all streams. Cannot be used after calling this.
destroy(): Promise<void>

SttStream

Streaming recognition stream. Created by StreamingSttEngine.createStream().

Properties

streamId
string
Unique identifier for this stream.

Methods

acceptWaveform()

Feed PCM audio samples to the stream.
acceptWaveform(samples: number[], sampleRate: number): Promise<void>
samples
number[]
required
PCM audio samples as float values in range [-1, 1].
sampleRate
number
required
Sample rate in Hz (typically 16000).
Example:
await stream.acceptWaveform(audioChunk, 16000);

isReady()

Check if there’s enough audio to run decoding.
isReady(): Promise<boolean>
Returns: true if decode can be called.

decode()

Run decoding on accumulated audio. Call when isReady() returns true.
decode(): Promise<void>

getResult()

Get current partial or final recognition result. Call after decode().
getResult(): Promise<StreamingSttResult>
Returns: Promise resolving to StreamingSttResult. Example:
if (await stream.isReady()) {
  await stream.decode();
  const result = await stream.getResult();
  console.log('Text:', result.text);
  console.log('Tokens:', result.tokens);
}

isEndpoint()

Check if endpoint (end of utterance) was detected based on configured rules.
isEndpoint(): Promise<boolean>
Returns: true if endpoint detected. Example:
if (await stream.isEndpoint()) {
  const final = await stream.getResult();
  console.log('Final utterance:', final.text);
  await stream.reset();
}

reset()

Reset stream state for reuse (clears audio buffer and recognition state).
reset(): Promise<void>

inputFinished()

Signal that no more audio will be fed to the stream.
inputFinished(): Promise<void>

release()

Release native stream resources. Do not use the stream after calling this.
release(): Promise<void>

processAudioChunk()

Convenience method that feeds audio, auto-decodes while ready, and returns result with endpoint status. Reduces bridge round-trips from 5 to 1 per chunk.
processAudioChunk(
  samples: number[],
  sampleRate: number
): Promise<{
  result: StreamingSttResult;
  isEndpoint: boolean;
}>
samples
number[]
required
PCM audio samples. Automatically normalized if enableInputNormalization was true (default).
sampleRate
number
required
Sample rate in Hz.
Returns: Object with result and isEndpoint boolean. Example:
const { result, isEndpoint } = await stream.processAudioChunk(
  audioChunk,
  16000
);

console.log('Current text:', result.text);

if (isEndpoint) {
  console.log('End of utterance detected');
  await stream.reset();
}

Utility Functions

mapDetectedToOnlineType()

Map detected offline STT model type to streaming (online) model type. Throws if model doesn’t support streaming.
function mapDetectedToOnlineType(
  detectedType: string | undefined
): OnlineSTTModelType
Example:
const detected = await detectSttModel(modelPath);
const onlineType = mapDetectedToOnlineType(detected.modelType);
// Returns 'transducer', 'paraformer', 'zipformer2_ctc', etc.

getOnlineTypeOrNull()

Returns streaming model type for a detected model, or null if streaming is not supported.
function getOnlineTypeOrNull(
  detectedType: string | undefined
): OnlineSTTModelType | null
Example:
const detected = await detectSttModel(modelPath);
const canStream = getOnlineTypeOrNull(detected.modelType);

if (canStream) {
  const engine = await createStreamingSTT({
    modelPath,
    modelType: canStream
  });
} else {
  console.log('Model does not support streaming');
}

Build docs developers (and LLMs) love