POST /api/transcribe
Transcribe audio to text with high accuracy using Groq’s Whisper Large v3 model. This endpoint handles real-time voice transcription with fast processing (200-500ms latency).Request Body
Base64-encoded audio data in WebM format. The audio should be captured from the microphone and encoded before sending.
The language code for the audio. Whisper supports 99+ languages. Examples:
en- English (default)es- Spanishfr- Frenchde- Germanja- Japanesezh- Chinese
Response
The transcribed text from the audio.
The detected or specified language of the transcription.
Processing time in milliseconds.
Example Request
Example Response
Technical Details
Audio Format Requirements
Audio must be in WebM format for optimal compatibility. The endpoint uses Groq’s Whisper Large v3 model which provides:
- 95%+ accuracy for clear speech
- Support for 99+ languages
- Fast processing (200-500ms latency)
Supported Languages
Whisper Large v3 supports multilingual transcription with automatic language detection. Major supported languages include:- European: English, Spanish, French, German, Italian, Portuguese, Dutch, Polish, Russian
- Asian: Chinese (Mandarin), Japanese, Korean, Hindi, Thai, Vietnamese, Indonesian
- Middle Eastern: Arabic, Hebrew, Turkish, Persian
- And 80+ more languages
Performance Characteristics
| Metric | Value |
|---|---|
| Average Latency | 200-500ms |
| Max Audio Length | 30 seconds per request |
| Accuracy (clear speech) | 95%+ |
| Streaming Support | No (process complete audio) |
Use Cases
Voice Typing
Real-time voice-to-text for hands-free typing
Voice Commands
Transcribe spoken commands for desktop automation
Meeting Notes
Convert speech to text for documentation
Accessibility
Enable voice input for users who prefer speech
Keyboard Shortcuts
Ctrl+Alt+T- Toggle voice transcription modeCtrl+Shift+T- Cycle through transcription modes (Direct Paste, Typewriter, Buffer)
nextjs-backend/src/app/api/transcribe/route.ts
Related Endpoints
- Speech - Convert text to speech (TTS)
- Voice Agent - Real-time voice conversation
- Completion - Process transcribed text with AI