Text-to-Speech

Overview

This is a standalone text-to-speech endpoint that converts any text into MP3 audio using ElevenLabs. It’s independent of the conversation flow and can be used to generate audio for any text. This endpoint returns raw audio data (not JSON), making it suitable for direct audio playback or download. Required Environment Variable:

ELEVENLABS_API_KEY - Your ElevenLabs API key

Request

Accepts either JSON or form data.

text

string

required

The text to convert to speech. Maximum 1500 characters (automatically truncated if longer).This field is required and cannot be empty.

Response

Content-Type: audio/mpeg Headers:

Content-Disposition: inline; filename=speech.mp3

The response body is raw MP3 audio data (binary). You can:

Play it directly in an audio player
Save it to a file with .mp3 extension
Stream it to users
Embed it in HTML audio elements

Error Responses

400 Bad Request

No text provided:

{
  "error": "No text provided"
}

500 Internal Server Error

ElevenLabs API key not configured:

{
  "error": "ELEVENLABS_API_KEY not set in .env"
}

ElevenLabs API error:

{
  "error": "<error details from ElevenLabs or Python exception>"
}

Examples

# JSON request
curl -X POST http://localhost:5000/tts \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello, welcome to our pizza restaurant!"}' \
  --output speech.mp3

# Form data request
curl -X POST http://localhost:5000/tts \
  -F "text=Hello, welcome to our pizza restaurant!" \
  --output speech.mp3

Implementation Details

ElevenLabs Configuration

Voice ID: JBFqnCBsd6RMkjVDRZzb
Model: eleven_turbo_v2_5
Format: MP3 audio
Text Length: Maximum 1500 characters (automatically truncated)

Text Truncation

If your text exceeds 1500 characters, only the first 1500 characters will be converted to speech. This is done to:

Manage API costs
Ensure reasonable response times
Stay within ElevenLabs rate limits

Example:

text = "Very long text..." * 1000  # 10,000+ characters
# Only first 1500 chars will be converted

Audio Stream Handling

The endpoint uses a custom collect_audio_bytes() function to handle different audio stream formats from the ElevenLabs client:

Byte arrays (most common)
Iterable streams (chunks)
String data (rare)

This ensures compatibility across different versions of the ElevenLabs SDK.

Content-Type Handling

The endpoint accepts both:

JSON: Content-Type: application/json with {"text": "..."}
Form data: Content-Type: application/x-www-form-urlencoded with text=...

This makes it flexible for different client types (browsers, API clients, etc.).

Response Headers

The response includes:

Content-Type: audio/mpeg
Content-Disposition: inline; filename=speech.mp3

Content-Type: audio/mpeg tells the browser it’s MP3 audio
Content-Disposition: inline suggests playing in-browser rather than downloading
filename=speech.mp3 provides a default filename if the user saves it

Use Cases

Preview audio generation - Test TTS before integrating into calls
Generate IVR prompts - Create audio files for your phone system
Accessibility features - Convert text content to audio for users
Testing voice quality - Compare different text inputs and voice settings
Standalone audio API - Use independently of the conversation features

Performance Notes

Response time depends on text length and ElevenLabs API performance
Typical response time: 1-3 seconds for short texts
Consider caching frequently used audio to reduce API calls
The endpoint has a 30-second timeout for the ElevenLabs API call

Endpoints

Overview

Request

Response

Error Responses

400 Bad Request

500 Internal Server Error

Examples

Implementation Details

ElevenLabs Configuration

Text Truncation

Audio Stream Handling

Content-Type Handling

Response Headers

Use Cases

Performance Notes

Build docs developers (and LLMs) love

Endpoints

​Overview

​Request

​Response

​Error Responses

​400 Bad Request

​500 Internal Server Error

​Examples

​Implementation Details

​ElevenLabs Configuration

​Text Truncation

​Audio Stream Handling

​Content-Type Handling

​Response Headers

​Use Cases

​Performance Notes

Build docs developers (and LLMs) love

Overview

Request

Response

Error Responses

400 Bad Request

500 Internal Server Error

Examples

Implementation Details

ElevenLabs Configuration

Text Truncation

Audio Stream Handling

Content-Type Handling

Response Headers

Use Cases

Performance Notes