Overview
Tabby’s voice transcription feature uses Groq’s Whisper Large v3 model to convert your speech into text with high accuracy and speed.Key Features
Fast Processing
Powered by Groq’s optimized infrastructure for near-instant transcription
High Accuracy
Uses Whisper Large v3, one of the most accurate transcription models available
Multi-language
Supports transcription in multiple languages with automatic detection
System-wide
Works in any application - text editors, browsers, messaging apps, etc.
Activation
Ctrl+Alt+T - Toggle voice transcription on/off Ctrl+Shift+T - Cycle through transcription modes When activated, speak into your microphone and the transcribed text will appear in the active application.How It Works
Capture Audio
When you press the transcription hotkey, Tabby starts capturing audio from your microphone in WebM format.
API Endpoint
Transcribe Audio
Request Format
Audio data in one of two formats:
- Data URL:
data:audio/webm;base64,<BASE64_DATA> - Raw base64 string (will be auto-detected)
Response Fields
The transcribed text from the audio input
Detected language code (e.g., “en” for English, “es” for Spanish)
Duration of the audio clip in seconds
Implementation Details
Audio Format
Tabby captures audio in WebM format using the browser’s MediaRecorder API:Processing Pipeline
The transcription API processes audio through the following pipeline:- Parse data URL - Extract MIME type and base64 data
- Decode base64 - Convert to Uint8Array binary format
- Send to Groq - Use AI SDK’s
experimental_transcribefunction - Return results - Text, language, and duration
Transcription Modes
Cycle through different modes using Ctrl+Shift+T:- Direct Paste
- Typewriter
- Buffer
Transcribed text is pasted directly at cursor position (fastest)
Use Cases
Writing Documents
Dictate long-form content faster than typing
Coding Comments
Quickly add documentation and comments to your code
Messaging
Compose emails and chat messages hands-free
Accessibility
Enable keyboard-free text input for users with mobility limitations
Best Practices
Accuracy Tips
Whisper Large v3 is trained to recognize punctuation commands:Requirements
Environment Setup
Add your Groq API key to the backend environment:Error Handling
The transcription API handles various error conditions:Error message when transcription fails:
"No audio data provided"- Missing audio parameter"Transcription failed"- Groq API error- Network errors or API rate limits
Performance
Groq’s infrastructure provides exceptional performance:
- Latency: Typically 200-500ms for transcription
- Accuracy: 95%+ word accuracy for clear speech
- Languages: Supports 50+ languages
- Audio length: Handles clips from 1 second to several minutes
Related Features
Voice Agent
Interactive voice assistant with desktop automation
Voice Commands
Execute quick actions via voice shortcuts