Voice Transcription

Voice Transcription enables you to dictate text directly into any application using state-of-the-art speech recognition.

Overview

Tabby’s voice transcription feature uses Groq’s Whisper Large v3 model to convert your speech into text with high accuracy and speed.

Key Features

Fast Processing

High Accuracy

Uses Whisper Large v3, one of the most accurate transcription models available

Multi-language

Supports transcription in multiple languages with automatic detection

System-wide

Works in any application - text editors, browsers, messaging apps, etc.

Activation

Ctrl+Alt+T - Toggle voice transcription on/off Ctrl+Shift+T - Cycle through transcription modes When activated, speak into your microphone and the transcribed text will appear in the active application.

How It Works

Capture Audio

When you press the transcription hotkey, Tabby starts capturing audio from your microphone in WebM format.

Send to API

The audio buffer is encoded as base64 and sent to the transcription API endpoint.

Transcribe with Groq

The API uses Groq’s Whisper Large v3 model to convert speech to text.

Output Text

The transcribed text is returned and typed into your active application using Tabby’s typewriter mode.

API Endpoint

Transcribe Audio

const response = await fetch('/api/transcribe', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    audio: 'data:audio/webm;base64,AUDIO_DATA_HERE'
  })
});

const result = await response.json();
// Returns: { text, language, duration }

Request Format

audio

string

required

Audio data in one of two formats:

Data URL: data:audio/webm;base64,<BASE64_DATA>
Raw base64 string (will be auto-detected)

Response Fields

text

string

The transcribed text from the audio input

language

string

Detected language code (e.g., “en” for English, “es” for Spanish)

duration

number

Duration of the audio clip in seconds

Implementation Details

Audio Format

Tabby captures audio in WebM format using the browser’s MediaRecorder API:

const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
const mediaRecorder = new MediaRecorder(stream, {
  mimeType: 'audio/webm'
});

mediaRecorder.ondataavailable = (event) => {
  // Convert blob to base64
  const reader = new FileReader();
  reader.onloadend = () => {
    const base64Audio = reader.result; // data:audio/webm;base64,...
    // Send to transcription API
  };
  reader.readAsDataURL(event.data);
};

Processing Pipeline

The transcription API processes audio through the following pipeline:

Parse data URL - Extract MIME type and base64 data
Decode base64 - Convert to Uint8Array binary format
Send to Groq - Use AI SDK’s experimental_transcribe function
Return results - Text, language, and duration

import { experimental_transcribe as transcribe } from 'ai'
import { createGroq } from '@ai-sdk/groq'

const groq = createGroq({
  apiKey: process.env.GROQ_API_KEY,
})

const result = await transcribe({
  model: groq.transcription('whisper-large-v3'),
  audio: audioData,  // Uint8Array
})

console.log(result.text)
console.log(result.language)
console.log(result.durationInSeconds)

Transcription Modes

Cycle through different modes using Ctrl+Shift+T:

Direct Paste
Typewriter
Buffer

Transcribed text is pasted directly at cursor position (fastest)

Use Cases

Writing Documents

Dictate long-form content faster than typing

Coding Comments

Quickly add documentation and comments to your code

Messaging

Compose emails and chat messages hands-free

Accessibility

Enable keyboard-free text input for users with mobility limitations

Best Practices

For optimal transcription accuracy:

Speak clearly and at a moderate pace
Use a good quality microphone
Minimize background noise
Pause briefly between sentences
Speak punctuation when needed (“comma”, “period”, “question mark”)

Accuracy Tips

Whisper Large v3 is trained to recognize punctuation commands:

"Hello world comma how are you today question mark"

Transcribed:
"Hello world, how are you today?"

Requirements

Voice Transcription requires:

Groq API key with Whisper API access
Microphone permissions in your browser
Active internet connection for API calls

Environment Setup

Add your Groq API key to the backend environment:

GROQ_API_KEY="your-groq-api-key-here"

Error Handling

The transcription API handles various error conditions:

error

string

Error message when transcription fails:

"No audio data provided" - Missing audio parameter
"Transcription failed" - Groq API error
Network errors or API rate limits

Performance

Groq’s infrastructure provides exceptional performance:

Latency: Typically 200-500ms for transcription
Accuracy: 95%+ word accuracy for clear speech
Languages: Supports 50+ languages
Audio length: Handles clips from 1 second to several minutes

Voice Agent

Interactive voice assistant with desktop automation

Voice Commands

Execute quick actions via voice shortcuts

Interview Copilot

AI Assistance

Memory & Brain

Voice Features

Automation

Overview

Key Features

Fast Processing

High Accuracy

Multi-language

System-wide

Activation

How It Works

API Endpoint

Transcribe Audio

Request Format

Response Fields

Implementation Details

Audio Format

Processing Pipeline

Transcription Modes

Use Cases

Writing Documents

Coding Comments

Messaging

Accessibility

Best Practices

Accuracy Tips

Requirements

Environment Setup

Error Handling

Performance

Voice Agent

Voice Commands

Build docs developers (and LLMs) love

Interview Copilot

AI Assistance

Memory & Brain

Voice Features

Automation

​Overview

​Key Features

Fast Processing

High Accuracy

Multi-language

System-wide

​Activation

​How It Works

​API Endpoint

​Transcribe Audio

​Request Format

​Response Fields

​Implementation Details

​Audio Format

​Processing Pipeline

​Transcription Modes

​Use Cases

Writing Documents

Coding Comments

Messaging

Accessibility

​Best Practices

​Accuracy Tips

​Requirements

​Environment Setup

​Error Handling

​Performance

​Related Features

Voice Agent

Voice Commands

Build docs developers (and LLMs) love

Overview

Key Features

Activation

How It Works

API Endpoint

Transcribe Audio

Request Format

Response Fields

Implementation Details

Audio Format

Processing Pipeline

Transcription Modes

Use Cases

Best Practices

Accuracy Tips

Requirements

Environment Setup

Error Handling

Performance

Related Features