
Voice Interfaces for Everyone
Moonshine Voice is an open source AI toolkit for developers building real-time voice applications. Everything runs on-device with cutting-edge accuracy and ultra-low latency.Quickstart
Get transcribing in under 2 minutes with Python
Installation
Install for Python, iOS, Android, and more
Python Guide
Complete Python API reference and examples
API Reference
Full API documentation for all platforms
Why Moonshine Voice?
On-Device & Private
Everything runs locally on your device. Fast, private, and no account, credit card, or API keys needed. Your users’ voice data never leaves their device.Optimized for Live Speech
Built specifically for real-time streaming applications with low latency responses. The framework does work while the user is still talking, delivering sub-200ms response times.Higher Accuracy Than Whisper
Our Medium Streaming model achieves 6.65% WER on the HuggingFace OpenASR Leaderboard, outperforming Whisper Large V3 (7.44% WER) while using only 245M parameters vs 1.5B.107ms
MacBook Pro latency
5-10x faster
Than Whisper in live speech
26MB
Smallest model size
Cross-Platform Support
The same library runs everywhere with one consistent API:Python
pip install moonshine-voice
iOS & MacOS
Swift Package Manager
Android
Maven package
Windows
Visual Studio support
Linux
Native C++ library
Edge Devices
Raspberry Pi, IoT, wearables
Key Features
Flexible Input Windows
Flexible Input Windows
Supply any length of audio (up to ~30 seconds) and the model only spends compute on that input. No wasted computation on zero-padding like Whisper’s fixed 30-second window.
Streaming with Caching
Streaming with Caching
Models cache input encoding and decoder state for incremental audio addition. This dramatically reduces latency by skipping redundant computation on audio that’s already been processed.
Multiple Languages
Multiple Languages
Supports English, Spanish, Mandarin, Japanese, Korean, Vietnamese, Ukrainian, and Arabic. Language-specific models deliver much higher accuracy than multilingual alternatives.
Complete Voice Pipeline
Complete Voice Pipeline
Batteries included with microphone capture, voice activity detection, speech-to-text, speaker identification (diarization), and command recognition - all in one library.
Intent Recognition
Intent Recognition
Built-in command recognition using semantic matching. Users can say commands naturally: “Let there be light” triggers “Turn on the lights” with 76% confidence.
Event-Driven Architecture
Event-Driven Architecture
High-level APIs with event listeners for line started, text changed, and line completed events. Focus on your application logic, not audio processing details.
Performance Comparison
Moonshine dramatically outperforms Whisper for live speech applications:| Model | WER | Parameters | MacBook Pro | Linux x86 | R. Pi 5 |
|---|---|---|---|---|---|
| Moonshine Medium Streaming | 6.65% | 245M | 107ms | 269ms | 802ms |
| Whisper Large v3 | 7.44% | 1.5B | 11,286ms | 16,919ms | N/A |
| Moonshine Small Streaming | 7.84% | 123M | 73ms | 165ms | 527ms |
| Whisper Small | 8.59% | 244M | 1940ms | 3,425ms | 10,397ms |
| Moonshine Tiny Streaming | 12.00% | 34M | 34ms | 69ms | 237ms |
| Whisper Tiny | 12.81% | 39M | 277ms | 1,141ms | 5,863ms |
Research Foundation
Moonshine Voice is based on cutting-edge research from the Moonshine AI team:- Moonshine: Speech Recognition for Live Transcription - First-generation architecture with flexible input windows
- Flavors of Moonshine - Language-specific models for better accuracy
- Moonshine v2: Ergodic Streaming Encoder ASR - Streaming approach for latency-critical applications
Get Started
Ready to add voice to your application? Start with our quickstart guide:Quickstart Guide
Transcribe audio in under 2 minutes
Community & Support
Join Discord
Get live support from the community
GitHub Issues
Report bugs and request features