Overview
Klaus uses Moonshine Voice for local, on-device speech-to-text transcription. No audio is sent to external APIs.SpeechToText Class
TheSpeechToText class provides WAV audio transcription using the Moonshine Voice library.
Constructor
Optional runtime settings. If
None, reads from config.get_runtime_settings().Methods
transcribe(wav_bytes: bytes) -> str
Transcribe WAV audio bytes to text.
WAV-encoded audio data (mono or stereo, 16-bit PCM)
reload_settings(settings: config.RuntimeSettings | None = None) -> None
Reload settings and reinitialize the Moonshine transcriber.
New runtime settings. If
None, reads from config.get_runtime_settings().Configuration
STT settings are configured in~/.klaus/config.toml:
Supported Models
tiny- Fastest, lower accuracybase- Default, good balance of speed and accuracy
Supported Languages
en- English (default)- Additional languages may be supported depending on Moonshine Voice version
Implementation Details
- Local processing: All transcription happens on-device. No network calls.
- First-run download: The model is downloaded and compiled on first use (10-30 seconds).
- Windows DLL preload: On Windows,
moonshine.dllis preloaded before PyQt6 to avoid DLL conflicts. - Audio format: Accepts WAV files; automatically converts stereo to mono and normalizes to float32 in [-1, 1].
Source Reference
Seeklaus/stt.py for the full implementation.