Runtime Modes
SlasshyWispr supports three runtime modes for both speech-to-text (STT) and AI processing: Online, Offline (local), and Hybrid. Choose the mode that best fits your privacy, performance, and connectivity needs.RuntimeMode Types
Runtime modes control where processing happens:STT and AI runtime modes are configured independently. You can use online STT with offline AI, or vice versa.
Mode Overview
- Online Mode
- Offline Mode
- Hybrid Mode
Online Mode sends audio and text to cloud-based APIs.✅ Advantages:
- Highest accuracy with state-of-the-art models
- No local hardware requirements
- Access to latest models (GPT, Claude, etc.)
- Fast processing with powerful servers
- No storage needed for models
- Requires internet connection
- Data sent to third-party servers
- API costs may apply
- Potential latency from network
- Privacy considerations
Online Mode (Cloud APIs)
Online mode connects to cloud-based speech and AI APIs:Configuration
Setup Steps
- Open Settings > Models
- Set STT Runtime Mode to
Online - Set AI Runtime Mode to
Online - Go to Settings > Online
- Enter your API Base URL (e.g.,
https://api.openai.com/v1) - Enter your API Key
- Specify STT Model Name (e.g.,
whisper-1) - Specify AI Model Name (e.g.,
gpt-4)
API keys are stored securely when
rememberApiKey is enabled. Never share your API keys.Supported Providers
SlasshyWispr supports OpenAI-compatible APIs:- OpenAI - GPT models and Whisper STT
- Anthropic - Claude models (via compatible endpoint)
- Custom APIs - Any OpenAI-compatible endpoint
Online Mode Workflow
Offline Mode (Local Models)
Offline mode runs models locally using on-device processing:Local STT Models
SlasshyWispr supports several local STT models via the Parakeet and Whisper families:Recommended Local STT Models
| Model | Size | Speed | Accuracy | Best For |
|---|---|---|---|---|
| Parakeet v3 | 478 MB | Fast | High | Balanced performance |
| Moonshine Base | 58 MB | Fastest | Good | Low-end hardware |
| Whisper Small | 487 MB | Medium | Good | Most users |
| Whisper Large v3 | 1.1 GB | Slower | Highest | Accuracy priority |
| SenseVoice | 160 MB | Fast | High | Lightweight option |
Local AI Models (Ollama)
Offline AI uses Ollama for local language model inference:Setup Ollama
- Install Ollama from https://ollama.ai
- Start the Ollama service
- Pull a model:
ollama pull llama3.2 - Configure in Settings > Offline:
- Set Ollama Base URL to
http://127.0.0.1:11434 - Select your pulled model
- Set Ollama Base URL to
Recommended Ollama Models
| Model | Size | Speed | Quality | Best For |
|---|---|---|---|---|
| llama3.2 | 2GB | Fast | Good | General use |
| phi3 | 2.3GB | Fast | Good | Efficient responses |
| mistral | 4GB | Medium | High | Complex queries |
| llama3.1:70b | 40GB | Slow | Highest | Power users |
Hardware Advisor
SlasshyWispr analyzes your hardware and recommends optimal models:Offline Mode Workflow
Hybrid Mode (Routing Logic)
Hybrid mode combines online and offline processing with intelligent routing:How Hybrid Routing Works
- Primary Mode - Attempt using preferred mode (online or offline)
- Fallback Detection - Monitor for errors or connectivity issues
- Automatic Failover - Switch to alternative mode on failure
- Performance Optimization - Route based on request type and urgency
Hybrid Configuration
Configure both online and offline settings, then choose hybrid routing:Hybrid Routing Strategies
- Connectivity-Based
- Performance-Based
- Privacy-Based
Connectivity-Based Routing checks internet availability:
- Online Available → Use cloud APIs
- Offline Detected → Fallback to local models
- Network Restored → Switch back to cloud
Incognito Mode
Force local processing for maximum privacy:When to Use Each Mode
Use Online Mode When:
- ✅ You have reliable, fast internet
- ✅ Accuracy is the top priority
- ✅ You’re using advanced models (GPT-4, Claude)
- ✅ You don’t have powerful local hardware
- ✅ You’re okay with API costs
Use Offline Mode When:
- ✅ Privacy and data sovereignty are critical
- ✅ You work in offline or air-gapped environments
- ✅ You have a capable GPU (NVIDIA recommended)
- ✅ You want to avoid API costs
- ✅ You need consistent low-latency responses
Use Hybrid Mode When:
- ✅ You travel between online/offline environments
- ✅ You want automatic failover
- ✅ You need to balance privacy with performance
- ✅ You handle both sensitive and general content
- ✅ You want the best of both worlds
Performance Comparison
| Metric | Online | Offline | Hybrid |
|---|---|---|---|
| Accuracy | Highest | High | Varies |
| Latency | Varies (network) | Consistent | Optimized |
| Privacy | Low | Highest | Medium-High |
| Setup | Easy | Complex | Complex |
| Cost | API fees | Free | Mixed |
| Offline | ❌ | ✅ | ✅ |
Model Management
Downloading Local Models
Download STT models via Settings > Offline:Warmup and Loading
Local STT models require warmup before first use:Pipeline Metrics
Monitor real-time performance in Settings > Pipeline:Best Practices
- Start with Online - Get familiar with SlasshyWispr using easy online setup
- Test Offline on Your Hardware - Download a small model (Moonshine) to test performance
- Use Hybrid for Flexibility - Configure both modes for maximum adaptability
- Monitor Latency - Check pipeline metrics to identify bottlenecks
- Match Mode to Task - Use offline for privacy, online for accuracy
Related Features
- Voice Dictation - How dictation uses runtime modes
- Assistant Mode - AI processing in different runtime modes
- Clipboard Workflow - Output delivery across modes