Skip to main content
SlasshyWispr supports multiple local STT (Speech-to-Text) models for offline transcription, powered by the native Parakeet runtime via transcribe-rs.

Available Models

Choose from several high-quality models optimized for different performance and accuracy requirements:

Parakeet v3

nvidia/parakeet-tdt-0.6b-v3Size: 478 MBLatest NVIDIA Parakeet model with improved accuracy

Whisper Turbo

openai/whisper-large-v3-turboSize: 1.6 GBFastest large Whisper model with excellent accuracy

Moonshine Base

UsefulSensors/moonshine-baseSize: 58.0 MBSmallest model, ideal for low-resource systems

SenseVoice

FunAudioLLM/SenseVoiceSmallSize: 160 MBCompact multilingual model with emotion detection

Complete Model List

ModelSizeDescription
nvidia/parakeet-tdt-0.6b-v3478 MBParakeet v3 - Latest generation
nvidia/parakeet-tdt_ctc-110m473 MBParakeet v2 - Previous generation
nvidia/parakeet-tdt-0.6b-v2473 MBParakeet v2 (alternate)
openai/whisper-large-v3-turbo1.6 GBWhisper Turbo - Fastest large model
openai/whisper-large-v31.1 GBWhisper Large - High accuracy
openai/whisper-medium492 MBWhisper Medium - Balanced
openai/whisper-small487 MBWhisper Small - Efficient
UsefulSensors/moonshine-base58.0 MBMoonshine - Smallest footprint
FunAudioLLM/SenseVoiceSmall160 MBSenseVoice - Multilingual

Model Performance

Performance Tiers

1

High Performance

Whisper Turbo and Whisper Large models offer the highest accuracy but require more RAM and processing power.Recommended for: High-end systems with 16GB+ RAM
2

Balanced Performance

Parakeet v3, Whisper Medium, and Whisper Small provide excellent accuracy with moderate resource usage.Recommended for: Mid-range systems with 8-16GB RAM
3

Lightweight Performance

Moonshine and SenseVoice are optimized for minimal resource consumption.Recommended for: Low-resource systems with 4-8GB RAM
SlasshyWispr will analyze your hardware and recommend the best model for your system. See Hardware Requirements for details.

Download and Installation

1

Access Local Settings

Navigate to Settings > Offline tab in SlasshyWispr
2

Select a Model

Choose a model from the STT Model dropdown based on your hardware capabilities
3

Download the Model

Click Download Model to begin downloading from Hugging FaceThe download progress will be displayed with:
  • Current file being downloaded
  • Download percentage
  • Files completed / total files
  • Downloaded bytes / total bytes
4

Wait for Completion

Large models may take several minutes to download depending on your internet connection
Models are downloaded from Hugging Face and stored locally in your SlasshyWispr data directory. Once downloaded, they can be used completely offline.

Model Warmup Process

Before first use, models need to be “warmed up” to load into memory and optimize performance.

What is Warmup?

Warmup involves:
  1. Loading the model into memory
  2. Initializing the inference engine
  3. Running a test transcription to optimize caching
  4. Preparing the model for real-time use

When Does Warmup Happen?

Warmup occurs automatically when you:
  • Select a model for the first time
  • Switch to a different model
  • Restart SlasshyWispr with a local model enabled

Warmup Duration

Warmup time varies by model size:
  • Small models (< 100 MB): 5-10 seconds
  • Medium models (400-500 MB): 10-20 seconds
  • Large models (> 1 GB): 20-40 seconds
Do not close SlasshyWispr during model warmup. The warmup process must complete for transcription to work properly.

Native Parakeet Runtime

SlasshyWispr uses transcribe-rs with native Parakeet support for high-performance local transcription.

Key Features

  • Zero Python Dependencies: All models run natively in Rust via ONNX Runtime
  • Low Latency: Optimized for real-time transcription with minimal delay
  • Cross-Platform: Works on Windows, macOS, and Linux
  • GPU Acceleration: Automatic NVIDIA GPU detection and utilization when available
  • Memory Efficient: Smart memory management for concurrent model loading

Runtime Architecture

The local STT runtime:
  1. Uses ONNX Runtime 2.0 (ort = "2.0.0-rc.10") for model inference
  2. Leverages transcribe-rs with Parakeet features enabled
  3. Manages model lifecycle (download, warmup, deactivate)
  4. Handles audio preprocessing and post-processing
  5. Provides daemon mode for keeping models hot in memory
The runtime automatically manages multiple model instances and can keep models “hot” in memory for instant transcription when you start dictating.

Model Management

Checking Model Status

You can check if a model is:
  • Downloaded and available locally
  • Currently loaded in memory (warmed up)
  • Active and ready for transcription

Deleting Models

To free up disk space, you can delete downloaded models through Settings > Offline. This removes the model files from your local storage.
Deleting a model requires re-downloading it before you can use it again. Make sure you have an internet connection when you need to re-download.

Opening Model Directory

You can open the local model storage directory to inspect or manually manage model files.

Best Practices

1

Start with Recommended Model

Use SlasshyWispr’s hardware advisor to select the optimal model for your system
2

Test Multiple Models

Try different models to find the best balance of speed and accuracy for your use case
3

Keep Models Updated

Newer model versions (like Parakeet v3) often have improved accuracy
4

Monitor Performance

Check STT latency in the pipeline settings to ensure smooth real-time transcription

Troubleshooting

Model Won’t Download

  • Check your internet connection
  • Ensure you have sufficient disk space (check model size above)
  • Verify Hugging Face is accessible from your network

Slow Transcription

  • Try a smaller model (Moonshine, SenseVoice, or Whisper Small)
  • Check if your system meets the hardware requirements
  • Close other resource-intensive applications

Model Warmup Fails

  • Ensure sufficient RAM is available
  • Try restarting SlasshyWispr
  • Check the logs for specific error messages
For GPU-accelerated transcription, see Hardware Requirements for NVIDIA GPU setup.

Build docs developers (and LLMs) love