Skip to main content
Voice Transcription enables you to dictate text directly into any application using state-of-the-art speech recognition.

Overview

Tabby’s voice transcription feature uses Groq’s Whisper Large v3 model to convert your speech into text with high accuracy and speed.

Key Features

Fast Processing

Powered by Groq’s optimized infrastructure for near-instant transcription

High Accuracy

Uses Whisper Large v3, one of the most accurate transcription models available

Multi-language

Supports transcription in multiple languages with automatic detection

System-wide

Works in any application - text editors, browsers, messaging apps, etc.

Activation

Ctrl+Alt+T - Toggle voice transcription on/off Ctrl+Shift+T - Cycle through transcription modes When activated, speak into your microphone and the transcribed text will appear in the active application.

How It Works

1

Capture Audio

When you press the transcription hotkey, Tabby starts capturing audio from your microphone in WebM format.
2

Send to API

The audio buffer is encoded as base64 and sent to the transcription API endpoint.
3

Transcribe with Groq

The API uses Groq’s Whisper Large v3 model to convert speech to text.
4

Output Text

The transcribed text is returned and typed into your active application using Tabby’s typewriter mode.

API Endpoint

Transcribe Audio

const response = await fetch('/api/transcribe', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    audio: 'data:audio/webm;base64,AUDIO_DATA_HERE'
  })
});

const result = await response.json();
// Returns: { text, language, duration }

Request Format

audio
string
required
Audio data in one of two formats:
  • Data URL: data:audio/webm;base64,<BASE64_DATA>
  • Raw base64 string (will be auto-detected)

Response Fields

text
string
The transcribed text from the audio input
language
string
Detected language code (e.g., “en” for English, “es” for Spanish)
duration
number
Duration of the audio clip in seconds

Implementation Details

Audio Format

Tabby captures audio in WebM format using the browser’s MediaRecorder API:
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
const mediaRecorder = new MediaRecorder(stream, {
  mimeType: 'audio/webm'
});

mediaRecorder.ondataavailable = (event) => {
  // Convert blob to base64
  const reader = new FileReader();
  reader.onloadend = () => {
    const base64Audio = reader.result; // data:audio/webm;base64,...
    // Send to transcription API
  };
  reader.readAsDataURL(event.data);
};

Processing Pipeline

The transcription API processes audio through the following pipeline:
  1. Parse data URL - Extract MIME type and base64 data
  2. Decode base64 - Convert to Uint8Array binary format
  3. Send to Groq - Use AI SDK’s experimental_transcribe function
  4. Return results - Text, language, and duration
import { experimental_transcribe as transcribe } from 'ai'
import { createGroq } from '@ai-sdk/groq'

const groq = createGroq({
  apiKey: process.env.GROQ_API_KEY,
})

const result = await transcribe({
  model: groq.transcription('whisper-large-v3'),
  audio: audioData,  // Uint8Array
})

console.log(result.text)
console.log(result.language)
console.log(result.durationInSeconds)

Transcription Modes

Cycle through different modes using Ctrl+Shift+T:
Transcribed text is pasted directly at cursor position (fastest)

Use Cases

Writing Documents

Dictate long-form content faster than typing

Coding Comments

Quickly add documentation and comments to your code

Messaging

Compose emails and chat messages hands-free

Accessibility

Enable keyboard-free text input for users with mobility limitations

Best Practices

For optimal transcription accuracy:
  • Speak clearly and at a moderate pace
  • Use a good quality microphone
  • Minimize background noise
  • Pause briefly between sentences
  • Speak punctuation when needed (“comma”, “period”, “question mark”)

Accuracy Tips

Whisper Large v3 is trained to recognize punctuation commands:
"Hello world comma how are you today question mark"

Transcribed:
"Hello world, how are you today?"

Requirements

Voice Transcription requires:
  • Groq API key with Whisper API access
  • Microphone permissions in your browser
  • Active internet connection for API calls

Environment Setup

Add your Groq API key to the backend environment:
GROQ_API_KEY="your-groq-api-key-here"

Error Handling

The transcription API handles various error conditions:
error
string
Error message when transcription fails:
  • "No audio data provided" - Missing audio parameter
  • "Transcription failed" - Groq API error
  • Network errors or API rate limits

Performance

Groq’s infrastructure provides exceptional performance:
  • Latency: Typically 200-500ms for transcription
  • Accuracy: 95%+ word accuracy for clear speech
  • Languages: Supports 50+ languages
  • Audio length: Handles clips from 1 second to several minutes

Voice Agent

Interactive voice assistant with desktop automation

Voice Commands

Execute quick actions via voice shortcuts

Build docs developers (and LLMs) love