Kokoro Models

Kokoro is a multi-speaker, multi-language TTS model designed for flexible speech synthesis across different voices and languages.

Model Architecture

Kokoro uses an end-to-end neural TTS architecture:

Model (model.onnx or kokoro-*.onnx) – Neural TTS model
Tokens (tokens.txt) – Text token vocabulary
Optional configuration files

When to Use

Multi-Language Apps

Applications serving users in multiple languages

Multiple Voices

Need for different speakers/voices in one model

Fast Streaming

Real-time speech generation with low latency

Compact Deployment

Single model for multiple languages and voices

Supported Languages

Kokoro models support multiple languages including:

English
Spanish
French
German
And potentially others (check specific model variant)

Performance Characteristics

Aspect	Rating	Notes
Streaming	✅ Excellent	Native streaming support
Quality	⭐⭐⭐⭐	High quality, natural speech
Speed	⭐⭐⭐⭐⭐	Fast inference
Memory	⭐⭐⭐⭐	Moderate, suitable for mobile
Model Size	Small-Medium	Typically 20-60 MB

Download Links

Kokoro Models

Download Kokoro TTS models

Configuration Example

Basic TTS

import { createTTS } from 'react-native-sherpa-onnx/tts';

const tts = await createTTS({
  modelPath: {
    type: 'asset',
    path: 'models/kokoro-multi-language'
  },
  modelType: 'kokoro', // or 'auto'
  numThreads: 2,
});

const audio = await tts.generateSpeech('Hello, world!');
console.log('Generated:', audio.samples.length, 'samples');

await tts.destroy();

With Length Scale

const tts = await createTTS({
  modelPath: { type: 'asset', path: 'models/kokoro' },
  modelType: 'kokoro',
  modelOptions: {
    kokoro: {
      lengthScale: 1.0,  // Speech speed (< 1.0 = faster, > 1.0 = slower)
    }
  },
});

Streaming TTS

const tts = await createTTS({
  modelPath: { type: 'asset', path: 'models/kokoro' },
  modelType: 'kokoro',
});

const sampleRate = await tts.getSampleRate();
await tts.startPcmPlayer(sampleRate, 1);

await tts.generateSpeechStream('Streaming speech synthesis', { sid: 0, speed: 1.0 }, {
  onChunk: async (chunk) => {
    await tts.writePcmChunk(chunk.samples);
  },
  onEnd: async () => {
    await tts.stopPcmPlayer();
  },
});

await tts.destroy();

Multi-Speaker Usage

const tts = await createTTS({
  modelPath: { type: 'asset', path: 'models/kokoro-multi-speaker' },
  modelType: 'kokoro',
});

const numSpeakers = await tts.getNumSpeakers();
console.log('Available speakers:', numSpeakers);

// Generate with different voices
for (let sid = 0; sid < numSpeakers; sid++) {
  const audio = await tts.generateSpeech(`Hello from speaker ${sid}`, { sid, speed: 1.0 });
  console.log(`Speaker ${sid}:`, audio.samples.length, 'samples');
}

await tts.destroy();

Model Options

Kokoro models support one tuning parameter:

Option	Type	Default	Description
`lengthScale`	`number`	1.0	Speech speed. < 1.0 = faster, > 1.0 = slower

Tuning Examples

// Fast speech
const tts = await createTTS({
  modelPath: { type: 'asset', path: 'models/kokoro' },
  modelType: 'kokoro',
  modelOptions: {
    kokoro: { lengthScale: 0.8 }  // 20% faster
  },
});

// Slow, clear speech
const tts2 = await createTTS({
  modelPath: { type: 'asset', path: 'models/kokoro' },
  modelType: 'kokoro',
  modelOptions: {
    kokoro: { lengthScale: 1.3 }  // 30% slower
  },
});

Runtime Updates

const tts = await createTTS({ ... });

await tts.updateParams({
  modelOptions: {
    kokoro: { lengthScale: 1.2 }
  },
});

Model Detection

Kokoro models are detected by:

Folder name should contain kokoro (not kitten)
Files: model.onnx or kokoro-*.onnx, plus tokens.txt

Expected files:

model.onnx (or variant)
tokens.txt

Performance Tips

Optimize Thread Count

const tts = await createTTS({
  modelPath: { type: 'asset', path: 'models/kokoro' },
  numThreads: 4, // More threads for faster generation
});

Use Streaming for Responsiveness

For interactive apps:

await tts.generateSpeechStream(text, { sid: 0, speed: 1.0 }, {
  onChunk: async (chunk) => {
    await tts.writePcmChunk(chunk.samples); // Start playing immediately
  },
});

Hardware Acceleration

const tts = await createTTS({
  modelPath: { type: 'asset', path: 'models/kokoro' },
  provider: 'nnapi', // Android NNAPI
  // provider: 'xnnpack', // XNNPACK
});

Streaming Support

Streaming: ✅ YesKokoro models have excellent streaming support. Use generateSpeechStream() for low-latency, incremental audio generation.

Advantages

Multi-Language: Single model for multiple languages
Multi-Speaker: Multiple voices in one model
Fast Inference: Real-time capable
Streaming: Native incremental generation
Compact: One model instead of multiple language-specific models
Good Quality: Natural-sounding speech

Limitations

No Voice Cloning: Cannot synthesize custom voices from reference audio
Fixed Voices: Limited to model’s trained speakers
Less Tuning: Only lengthScale parameter available (no noise scale)
Language Coverage: Fewer languages than some alternatives

Use Cases

Multilingual Apps

Apps serving users in multiple countries

Voice Assistants

Interactive voice interfaces with multiple voices

E-Learning

Educational content in multiple languages

Customer Service

Automated responses in different languages

Common Issues

Model not loading

Verify folder name contains kokoro (not kitten)
Check that model.onnx and tokens.txt are present
Ensure sufficient device memory

Incorrect language output

Kokoro may auto-detect language from text
Ensure input text is in the correct language/script
Some models may require language-specific prefixes

Slow generation

Increase numThreads on multi-core devices
Use hardware acceleration (provider: 'nnapi')
Ensure no other heavy apps are running

Comparison with Other Models

Feature	Kokoro	VITS	Matcha	KittenTTS
Speed	Fast	Very Fast	Fast	Very Fast
Quality	High	High	Very High	Good
Streaming	Yes	Yes	Yes	Yes
Multi-Language	Yes	Varies	Limited	Limited
Multi-Speaker	Yes	Yes	Yes	Yes
Voice Cloning	No	No	No	No
Model Size	Small-Medium	Small	Medium	Small

Next Steps

TTS API

Detailed API documentation

Streaming TTS

Low-latency streaming guide

Model Setup

How to download and bundle models

Execution Providers

Hardware acceleration options

Speech-to-Text Models

Text-to-Speech Models

​Kokoro Models

​Model Architecture

​When to Use

Multi-Language Apps

Multiple Voices

Fast Streaming

Compact Deployment

​Supported Languages

​Performance Characteristics

​Download Links

Kokoro Models

​Configuration Example

​Basic TTS

​With Length Scale

​Streaming TTS

​Multi-Speaker Usage

​Model Options

​Tuning Examples

​Runtime Updates

​Model Detection

​Performance Tips

​Optimize Thread Count

​Use Streaming for Responsiveness

​Hardware Acceleration

​Streaming Support

​Advantages

​Limitations

​Use Cases

Multilingual Apps

Voice Assistants

E-Learning

Customer Service

​Common Issues

​Comparison with Other Models

​Next Steps

TTS API

Streaming TTS

Model Setup

Execution Providers

Build docs developers (and LLMs) love

Kokoro Models

Model Architecture

When to Use

Supported Languages

Performance Characteristics

Download Links

Configuration Example

Basic TTS

With Length Scale

Streaming TTS

Multi-Speaker Usage

Model Options

Tuning Examples

Runtime Updates

Model Detection

Performance Tips

Optimize Thread Count

Use Streaming for Responsiveness

Hardware Acceleration

Streaming Support

Advantages

Limitations

Use Cases

Common Issues

Comparison with Other Models

Next Steps