POST /api/speech
Generate high-quality audio from text using OpenAI’s text-to-speech models. This endpoint returns audio as a base64-encoded data URL ready for playback.Request Body
The text to convert to speech. Maximum length varies by model but typically supports several paragraphs.
The voice to use for speech generation. OpenAI provides six natural-sounding voices:
alloy(default) - Neutral and balancedecho- Clear and expressivefable- Warm and engagingonyx- Deep and authoritativenova- Energetic and brightshimmer- Soft and calm
Response
Base64-encoded audio data URL in the format
data:audio/[format];base64,[data]. Ready to use with HTML5 <audio> elements or Web Audio API.Example Request
Example Response
Voice Characteristics
Choose the voice that best fits your use case:| Voice | Characteristics | Best For |
|---|---|---|
| alloy | Neutral, balanced, versatile | General purpose, professional content |
| echo | Clear, expressive, articulate | Presentations, tutorials, instructions |
| fable | Warm, engaging, friendly | Storytelling, casual content, greetings |
| onyx | Deep, authoritative, confident | Formal announcements, important messages |
| nova | Energetic, bright, upbeat | Notifications, positive messages, alerts |
| shimmer | Soft, calm, soothing | Relaxing content, gentle reminders |
Audio Format
The API returns audio in MP3 format by default, encoded as a base64 data URL. This format:- Works directly in browser
<audio>elements - Compatible with Web Audio API
- Small file size for quick transmission
- High quality at 24kHz sample rate
Use Cases
Voice Responses
Generate voice responses from AI in the Voice Agent
Notifications
Speak important notifications or alerts
Accessibility
Read text content aloud for visually impaired users
Multilingual Support
Generate speech in multiple languages with natural pronunciation
Error Handling
400 Bad Request
500 Server Error
Performance Considerations
- Average response time: 1-3 seconds depending on text length
- Text is processed in chunks for longer inputs
- Audio is streamed and encoded efficiently
- Use shorter text segments for faster response times
Integration Example
Complete example with error handling and playback controls:nextjs-backend/src/app/api/speech/route.ts
Related Endpoints
- Transcribe - Convert speech to text (STT)
- Voice Agent - Real-time voice conversations
- Chat - Generate text responses that can be spoken