Skip to main content

Kokoro Models

Kokoro is a multi-speaker, multi-language TTS model designed for flexible speech synthesis across different voices and languages.

Model Architecture

Kokoro uses an end-to-end neural TTS architecture:
  • Model (model.onnx or kokoro-*.onnx) – Neural TTS model
  • Tokens (tokens.txt) – Text token vocabulary
  • Optional configuration files

When to Use

Multi-Language Apps

Applications serving users in multiple languages

Multiple Voices

Need for different speakers/voices in one model

Fast Streaming

Real-time speech generation with low latency

Compact Deployment

Single model for multiple languages and voices

Supported Languages

Kokoro models support multiple languages including:
  • English
  • Spanish
  • French
  • German
  • And potentially others (check specific model variant)

Performance Characteristics

AspectRatingNotes
Streaming✅ ExcellentNative streaming support
Quality⭐⭐⭐⭐High quality, natural speech
Speed⭐⭐⭐⭐⭐Fast inference
Memory⭐⭐⭐⭐Moderate, suitable for mobile
Model SizeSmall-MediumTypically 20-60 MB

Kokoro Models

Download Kokoro TTS models

Configuration Example

Basic TTS

import { createTTS } from 'react-native-sherpa-onnx/tts';

const tts = await createTTS({
  modelPath: {
    type: 'asset',
    path: 'models/kokoro-multi-language'
  },
  modelType: 'kokoro', // or 'auto'
  numThreads: 2,
});

const audio = await tts.generateSpeech('Hello, world!');
console.log('Generated:', audio.samples.length, 'samples');

await tts.destroy();

With Length Scale

const tts = await createTTS({
  modelPath: { type: 'asset', path: 'models/kokoro' },
  modelType: 'kokoro',
  modelOptions: {
    kokoro: {
      lengthScale: 1.0,  // Speech speed (< 1.0 = faster, > 1.0 = slower)
    }
  },
});

Streaming TTS

const tts = await createTTS({
  modelPath: { type: 'asset', path: 'models/kokoro' },
  modelType: 'kokoro',
});

const sampleRate = await tts.getSampleRate();
await tts.startPcmPlayer(sampleRate, 1);

await tts.generateSpeechStream('Streaming speech synthesis', { sid: 0, speed: 1.0 }, {
  onChunk: async (chunk) => {
    await tts.writePcmChunk(chunk.samples);
  },
  onEnd: async () => {
    await tts.stopPcmPlayer();
  },
});

await tts.destroy();

Multi-Speaker Usage

const tts = await createTTS({
  modelPath: { type: 'asset', path: 'models/kokoro-multi-speaker' },
  modelType: 'kokoro',
});

const numSpeakers = await tts.getNumSpeakers();
console.log('Available speakers:', numSpeakers);

// Generate with different voices
for (let sid = 0; sid < numSpeakers; sid++) {
  const audio = await tts.generateSpeech(`Hello from speaker ${sid}`, { sid, speed: 1.0 });
  console.log(`Speaker ${sid}:`, audio.samples.length, 'samples');
}

await tts.destroy();

Model Options

Kokoro models support one tuning parameter:
OptionTypeDefaultDescription
lengthScalenumber1.0Speech speed. < 1.0 = faster, > 1.0 = slower

Tuning Examples

// Fast speech
const tts = await createTTS({
  modelPath: { type: 'asset', path: 'models/kokoro' },
  modelType: 'kokoro',
  modelOptions: {
    kokoro: { lengthScale: 0.8 }  // 20% faster
  },
});

// Slow, clear speech
const tts2 = await createTTS({
  modelPath: { type: 'asset', path: 'models/kokoro' },
  modelType: 'kokoro',
  modelOptions: {
    kokoro: { lengthScale: 1.3 }  // 30% slower
  },
});

Runtime Updates

const tts = await createTTS({ ... });

await tts.updateParams({
  modelOptions: {
    kokoro: { lengthScale: 1.2 }
  },
});

Model Detection

Kokoro models are detected by:
  • Folder name should contain kokoro (not kitten)
  • Files: model.onnx or kokoro-*.onnx, plus tokens.txt
Expected files:
  • model.onnx (or variant)
  • tokens.txt

Performance Tips

Optimize Thread Count

const tts = await createTTS({
  modelPath: { type: 'asset', path: 'models/kokoro' },
  numThreads: 4, // More threads for faster generation
});

Use Streaming for Responsiveness

For interactive apps:
await tts.generateSpeechStream(text, { sid: 0, speed: 1.0 }, {
  onChunk: async (chunk) => {
    await tts.writePcmChunk(chunk.samples); // Start playing immediately
  },
});

Hardware Acceleration

const tts = await createTTS({
  modelPath: { type: 'asset', path: 'models/kokoro' },
  provider: 'nnapi', // Android NNAPI
  // provider: 'xnnpack', // XNNPACK
});

Streaming Support

Streaming: ✅ YesKokoro models have excellent streaming support. Use generateSpeechStream() for low-latency, incremental audio generation.

Advantages

  1. Multi-Language: Single model for multiple languages
  2. Multi-Speaker: Multiple voices in one model
  3. Fast Inference: Real-time capable
  4. Streaming: Native incremental generation
  5. Compact: One model instead of multiple language-specific models
  6. Good Quality: Natural-sounding speech

Limitations

  1. No Voice Cloning: Cannot synthesize custom voices from reference audio
  2. Fixed Voices: Limited to model’s trained speakers
  3. Less Tuning: Only lengthScale parameter available (no noise scale)
  4. Language Coverage: Fewer languages than some alternatives

Use Cases

Multilingual Apps

Apps serving users in multiple countries

Voice Assistants

Interactive voice interfaces with multiple voices

E-Learning

Educational content in multiple languages

Customer Service

Automated responses in different languages

Common Issues

  • Verify folder name contains kokoro (not kitten)
  • Check that model.onnx and tokens.txt are present
  • Ensure sufficient device memory
  • Kokoro may auto-detect language from text
  • Ensure input text is in the correct language/script
  • Some models may require language-specific prefixes
  • Increase numThreads on multi-core devices
  • Use hardware acceleration (provider: 'nnapi')
  • Ensure no other heavy apps are running

Comparison with Other Models

FeatureKokoroVITSMatchaKittenTTS
SpeedFastVery FastFastVery Fast
QualityHighHighVery HighGood
StreamingYesYesYesYes
Multi-LanguageYesVariesLimitedLimited
Multi-SpeakerYesYesYesYes
Voice CloningNoNoNoNo
Model SizeSmall-MediumSmallMediumSmall

Next Steps

TTS API

Detailed API documentation

Streaming TTS

Low-latency streaming guide

Model Setup

How to download and bundle models

Execution Providers

Hardware acceleration options

Build docs developers (and LLMs) love