Skip to main content

Quick Start Guide

Get started with offline speech-to-text and text-to-speech in under 5 minutes.

Prerequisites

Before you begin, make sure you have:
  • Completed the Installation steps
  • A model downloaded (see Model Setup or use the quick download below)
  • An audio file to test (or use the examples below)

Download a Model

For this guide, we’ll use a small Whisper model for English transcription:
1

Choose a model

Download the Whisper Tiny English model (~40MB, fast, good accuracy):
# Using wget or curl
curl -LO https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-whisper-tiny.en.tar.bz2
tar -xvf sherpa-onnx-whisper-tiny.en.tar.bz2
Or use the Model Download Manager in your app:
import { downloadModel } from 'react-native-sherpa-onnx/download';

await downloadModel({
  url: 'https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-whisper-tiny.en.tar.bz2',
  destinationPath: '/path/to/models',
  onProgress: (progress) => console.log(`${progress}%`),
});
2

Place the model in your app

For Android, place the model folder in android/app/src/main/assets/models/:
android/app/src/main/assets/models/
  └── sherpa-onnx-whisper-tiny.en/
      ├── tiny.en-encoder.onnx
      ├── tiny.en-decoder.onnx
      └── tiny.en-tokens.txt
For iOS, add the model folder to your Xcode project as a resource.
See Model Setup for detailed instructions on bundling models, using Play Asset Delivery, or loading from the filesystem.

Speech-to-Text (STT)

Transcribe audio files with offline speech recognition.
1

Import the STT module

import { createSTT } from 'react-native-sherpa-onnx/stt';
import type { SttEngine } from 'react-native-sherpa-onnx/stt';
2

Initialize the STT engine

Create an STT instance with your model:
const stt: SttEngine = await createSTT({
  modelPath: {
    type: 'asset',
    path: 'models/sherpa-onnx-whisper-tiny.en',
  },
  modelType: 'whisper', // Optional: auto-detect if omitted
  numThreads: 2, // Adjust based on device
});
You can load models from different locations:
// From app assets (bundled with app)
modelPath: { type: 'asset', path: 'models/whisper-tiny' }

// From filesystem
modelPath: { type: 'file', path: '/absolute/path/to/model' }

// Auto-detect (searches assets, then filesystem)
modelPath: { type: 'auto', path: 'models/whisper-tiny' }
3

Transcribe an audio file

const result = await stt.transcribeFile('/path/to/audio.wav');

console.log('Transcription:', result.text);
// Output: "Hello, how are you today?"

console.log('Tokens:', result.tokens);
// Output: ["Hello", ",", "how", "are", "you", "today", "?"]

console.log('Timestamps:', result.timestamps);
// Output: [0.0, 0.5, 0.6, 1.0, 1.2, 1.5, 2.0]
4

Clean up

Always destroy the engine when done to free native resources:
await stt.destroy();

Transcribe Audio Samples

You can also transcribe raw PCM audio samples:
const samples: number[] = [...]; // Float32 PCM samples, range [-1, 1]
const sampleRate = 16000; // Hz

const result = await stt.transcribeSamples(samples, sampleRate);
console.log(result.text);

Complete STT Example

STTExample.tsx
import { useState } from 'react';
import { View, Button, Text } from 'react-native';
import { createSTT } from 'react-native-sherpa-onnx/stt';
import type { SttEngine } from 'react-native-sherpa-onnx/stt';

export default function STTExample() {
  const [transcription, setTranscription] = useState('');
  const [loading, setLoading] = useState(false);

  const transcribeAudio = async () => {
    setLoading(true);
    let stt: SttEngine | null = null;

    try {
      // Initialize STT
      stt = await createSTT({
        modelPath: {
          type: 'asset',
          path: 'models/sherpa-onnx-whisper-tiny.en',
        },
        modelType: 'whisper',
        numThreads: 2,
      });

      // Transcribe audio file
      const result = await stt.transcribeFile('/path/to/audio.wav');
      setTranscription(result.text);
    } catch (error) {
      console.error('Transcription failed:', error);
    } finally {
      // Clean up
      if (stt) await stt.destroy();
      setLoading(false);
    }
  };

  return (
    <View>
      <Button
        title={loading ? 'Transcribing...' : 'Transcribe Audio'}
        onPress={transcribeAudio}
        disabled={loading}
      />
      {transcription && <Text>Result: {transcription}</Text>}
    </View>
  );
}

Text-to-Speech (TTS)

Generate natural speech from text offline.
1

Download a TTS model

Download a VITS Piper model (~10-50MB depending on voice):
# English (US) female voice
curl -LO https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-piper-en_US-lessac-medium.tar.bz2
tar -xvf vits-piper-en_US-lessac-medium.tar.bz2
Place in android/app/src/main/assets/models/ or add to Xcode resources.
2

Import and initialize TTS

import { createTTS } from 'react-native-sherpa-onnx/tts';
import type { TtsEngine } from 'react-native-sherpa-onnx/tts';

const tts: TtsEngine = await createTTS({
  modelPath: {
    type: 'asset',
    path: 'models/vits-piper-en_US-lessac-medium',
  },
  modelType: 'vits',
  numThreads: 2,
});
3

Generate speech

const audio = await tts.generateSpeech('Hello, world!');

console.log('Sample rate:', audio.sampleRate);
// Output: 22050

console.log('Audio samples:', audio.samples.length);
// Output: 44100 (2 seconds of audio)
4

Save or play the audio

import { saveAudioToFile } from 'react-native-sherpa-onnx/tts';
import Sound from 'react-native-sound';

// Save to file
const filePath = await saveAudioToFile(
  audio,
  '/path/to/output.wav'
);
console.log('Saved to:', filePath);

// Play the audio
const sound = new Sound(filePath, '', (error) => {
  if (error) {
    console.error('Failed to load sound', error);
    return;
  }
  sound.play();
});
5

Clean up

await tts.destroy();

TTS with Options

Customize speech generation with options:
const audio = await tts.generateSpeech('Hello, world!', {
  speed: 1.2,           // Speak 20% faster
  sid: 0,               // Speaker ID (for multi-speaker models)
  silenceScale: 0.5,    // Reduce silence duration
});

Complete TTS Example

TTSExample.tsx
import { useState } from 'react';
import { View, TextInput, Button, Text } from 'react-native';
import { createTTS, saveAudioToFile } from 'react-native-sherpa-onnx/tts';
import type { TtsEngine } from 'react-native-sherpa-onnx/tts';
import Sound from 'react-native-sound';

export default function TTSExample() {
  const [text, setText] = useState('Hello, world!');
  const [generating, setGenerating] = useState(false);
  const [audioPath, setAudioPath] = useState<string | null>(null);

  const generateSpeech = async () => {
    setGenerating(true);
    let tts: TtsEngine | null = null;

    try {
      // Initialize TTS
      tts = await createTTS({
        modelPath: {
          type: 'asset',
          path: 'models/vits-piper-en_US-lessac-medium',
        },
        modelType: 'vits',
      });

      // Generate speech
      const audio = await tts.generateSpeech(text, { speed: 1.0 });

      // Save to file
      const outputPath = `/tmp/speech_${Date.now()}.wav`;
      await saveAudioToFile(audio, outputPath);
      setAudioPath(outputPath);

      // Play
      const sound = new Sound(outputPath, '', (error) => {
        if (!error) sound.play();
      });
    } catch (error) {
      console.error('TTS failed:', error);
    } finally {
      if (tts) await tts.destroy();
      setGenerating(false);
    }
  };

  return (
    <View>
      <TextInput
        value={text}
        onChangeText={setText}
        placeholder="Enter text to speak"
      />
      <Button
        title={generating ? 'Generating...' : 'Generate Speech'}
        onPress={generateSpeech}
        disabled={generating}
      />
      {audioPath && <Text>Audio saved to: {audioPath}</Text>}
    </View>
  );
}

Real-Time Streaming Recognition

Transcribe live microphone input with partial results.
1

Import streaming STT

import { createStreamingSTT } from 'react-native-sherpa-onnx/stt';
import type { StreamingSttEngine, SttStream } from 'react-native-sherpa-onnx/stt';
2

Initialize streaming engine

const streamingStt: StreamingSttEngine = await createStreamingSTT({
  modelPath: {
    type: 'asset',
    path: 'models/sherpa-onnx-streaming-zipformer-en',
  },
  modelType: 'transducer', // Streaming-capable model
  numThreads: 2,
});
Only certain model types support streaming: transducer, paraformer, zipformer2_ctc, nemo_ctc, tone_ctc.
3

Create a stream and feed audio

const stream: SttStream = await streamingStt.createStream();

// Feed audio samples (Float32, 16kHz recommended)
const samples: number[] = [...];
await stream.acceptWaveform(samples, 16000);

// Get partial result
const partial = await stream.getResult();
console.log('Partial:', partial.text);

// Check if speech endpoint detected
const isEndpoint = await stream.isEndpoint();
if (isEndpoint) {
  // Finalize the segment
  const final = await stream.getResult();
  console.log('Final:', final.text);
  await stream.reset();
}
4

Clean up

await stream.destroy();
await streamingStt.destroy();

Real-Time Microphone Transcription

MicrophoneSTT.tsx
import { useState, useRef } from 'react';
import { Button, Text, View } from 'react-native';
import { createStreamingSTT } from 'react-native-sherpa-onnx/stt';
import type { StreamingSttEngine, SttStream } from 'react-native-sherpa-onnx/stt';
import { AudioRecorder, useAudioInput } from 'react-native-audio-api';

export default function MicrophoneSTT() {
  const [isRecording, setIsRecording] = useState(false);
  const [partialText, setPartialText] = useState('');
  const [finalText, setFinalText] = useState('');

  const engineRef = useRef<StreamingSttEngine | null>(null);
  const streamRef = useRef<SttStream | null>(null);
  const recorderRef = useRef<AudioRecorder | null>(null);

  const startRecording = async () => {
    try {
      // Initialize engine
      engineRef.current = await createStreamingSTT({
        modelPath: { type: 'asset', path: 'models/zipformer-en' },
        modelType: 'transducer',
      });

      // Create stream
      streamRef.current = await engineRef.current.createStream();

      // Start microphone recording
      recorderRef.current = new AudioRecorder({
        sampleRate: 16000,
        channelCount: 1,
      });

      recorderRef.current.onDataAvailable((samples) => {
        if (streamRef.current) {
          streamRef.current.acceptWaveform(samples, 16000);
          
          // Get partial result
          streamRef.current.getResult().then((result) => {
            setPartialText(result.text);

            // Check for endpoint
            streamRef.current?.isEndpoint().then((isEnd) => {
              if (isEnd) {
                setFinalText((prev) => prev + ' ' + result.text);
                setPartialText('');
                streamRef.current?.reset();
              }
            });
          });
        }
      });

      recorderRef.current.start();
      setIsRecording(true);
    } catch (error) {
      console.error('Failed to start recording:', error);
    }
  };

  const stopRecording = async () => {
    if (recorderRef.current) {
      recorderRef.current.stop();
    }

    if (streamRef.current) {
      await streamRef.current.destroy();
    }

    if (engineRef.current) {
      await engineRef.current.destroy();
    }

    setIsRecording(false);
  };

  return (
    <View>
      <Button
        title={isRecording ? 'Stop Recording' : 'Start Recording'}
        onPress={isRecording ? stopRecording : startRecording}
      />
      <Text>Partial: {partialText}</Text>
      <Text>Final: {finalText}</Text>
    </View>
  );
}

Next Steps

Now that you’ve built your first speech app, explore more features:

Model Setup

Learn about model types, quantization, and Play Asset Delivery

STT API Reference

Complete STT API documentation

TTS API Reference

Complete TTS API documentation

Streaming TTS

Low-latency incremental speech generation

Execution Providers

Hardware acceleration with NNAPI, Core ML, QNN

Example App

Browse the full-featured example application
Need help? Check out the example app source code for complete working examples of STT, TTS, and streaming.

Build docs developers (and LLMs) love