Overview
LiteLLM provides unified interfaces for audio transcription (speech-to-text) and text-to-speech synthesis across multiple providers.Transcription API
transcription()
Transcribe audio to text using OpenAI Whisper or other speech-to-text providers.Function Signature
Parameters
The audio file to transcribe. Can be:
- File path (string)
- File object
- File-like object with read() method
Model to use for transcription.Examples:
whisper-1(OpenAI)azure/whisper(Azure OpenAI)whisper-large-v3(Groq)
Language of the input audio in ISO-639-1 format.Examples:
"en", "es", "fr", "de"Improves accuracy and latency.Optional text to guide the model’s style or continue a previous segment.Should match the audio language.
Format of the transcript output.Options:
"json": JSON with text field"text": Plain text"srt": SubRip subtitle format"verbose_json": JSON with timestamps"vtt": Web Video Text Tracks format
Sampling temperature between 0 and 1. Higher values make output more random.
Timestamp granularity for segments.Options:
["word"], ["segment"], or bothRequires response_format="verbose_json"Response
The transcribed text.
The task performed (“transcribe”).
Detected language of the audio.
Duration of the audio in seconds.
Segments with timestamps (if verbose_json format).
Word-level timestamps (if timestamp_granularities includes “word”).
Examples
Basic Transcription
With Language Hint
Verbose Output with Timestamps
SRT Subtitle Format
Async Transcription
Speech API (Text-to-Speech)
speech()
Generate spoken audio from text using text-to-speech models.Function Signature
Parameters
TTS model to use.Examples:
tts-1(OpenAI, faster)tts-1-hd(OpenAI, higher quality)azure/tts-1(Azure OpenAI)
Text to convert to speech. Maximum length 4096 characters.
Voice to use for generation.OpenAI voices:
alloyechofableonyxnovashimmer
Audio format.Options:
"mp3""opus""aac""flac""wav""pcm"
Playback speed of the audio.Range: 0.25 to 4.0
Response
Returns binary audio data that can be saved to a file or streamed.Examples
Basic Text-to-Speech
High Quality Voice
Adjust Speed
Different Voices
Async Speech Generation
Azure OpenAI TTS
Provider Support
Transcription Providers
- OpenAI: Whisper models
- Azure OpenAI: Whisper models
- Groq: Ultra-fast Whisper models
Text-to-Speech Providers
- OpenAI: tts-1, tts-1-hd
- Azure OpenAI: tts-1, tts-1-hd
- Vertex AI: Text-to-speech models
Error Handling
Best Practices
Transcription
- Provide language hint when known for better accuracy
- Use appropriate audio quality - higher quality = better transcription
- Keep files under 25MB - split larger files if needed
- Use prompt to maintain context across segments
Text-to-Speech
- Choose appropriate voice for your use case
- Use tts-1 for real-time applications (faster)
- Use tts-1-hd when quality is priority
- Break long text into smaller chunks for better control
- Test different speeds to find optimal playback rate