Text to Speech

POST /api/speech

Generate high-quality audio from text using OpenAI’s text-to-speech models. This endpoint returns audio as a base64-encoded data URL ready for playback.

Request Body

text

string

required

The text to convert to speech. Maximum length varies by model but typically supports several paragraphs.

voice

string

default:"alloy"

The voice to use for speech generation. OpenAI provides six natural-sounding voices:

alloy (default) - Neutral and balanced
echo - Clear and expressive
fable - Warm and engaging
onyx - Deep and authoritative
nova - Energetic and bright
shimmer - Soft and calm

Response

audio

string

Base64-encoded audio data URL in the format data:audio/[format];base64,[data]. Ready to use with HTML5 <audio> elements or Web Audio API.

Example Request

const response = await fetch('http://localhost:3001/api/speech', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    text: 'Hello! This is a test of the text to speech API.',
    voice: 'nova'
  })
});

const result = await response.json();
console.log('Audio data URL:', result.audio);

// Play the audio
const audio = new Audio(result.audio);
audio.play();

Example Response

{
  "audio": "data:audio/mp3;base64,SUQzBAAAAAAAI1RTU0UAAAAPAAADTGF2ZjYwLjE2LjEwMAAAAAAAAAAAAAAA..."
}

Voice Characteristics

Choose the voice that best fits your use case:

Voice	Characteristics	Best For
alloy	Neutral, balanced, versatile	General purpose, professional content
echo	Clear, expressive, articulate	Presentations, tutorials, instructions
fable	Warm, engaging, friendly	Storytelling, casual content, greetings
onyx	Deep, authoritative, confident	Formal announcements, important messages
nova	Energetic, bright, upbeat	Notifications, positive messages, alerts
shimmer	Soft, calm, soothing	Relaxing content, gentle reminders

Audio Format

The API returns audio in MP3 format by default, encoded as a base64 data URL. This format:

Works directly in browser <audio> elements
Compatible with Web Audio API
Small file size for quick transmission
High quality at 24kHz sample rate

Use Cases

Voice Responses

Generate voice responses from AI in the Voice Agent

Notifications

Speak important notifications or alerts

Accessibility

Read text content aloud for visually impaired users

Multilingual Support

Generate speech in multiple languages with natural pronunciation

Error Handling

400 Bad Request

{
  "error": "No text provided"
}

500 Server Error

{
  "error": "Speech generation failed"
}

Performance Considerations

Average response time: 1-3 seconds depending on text length
Text is processed in chunks for longer inputs
Audio is streamed and encoded efficiently
Use shorter text segments for faster response times

Integration Example

Complete example with error handling and playback controls:

async function speakText(text: string, voice: string = 'alloy') {
  try {
    const response = await fetch('http://localhost:3001/api/speech', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ text, voice })
    });
    
    if (!response.ok) {
      const error = await response.json();
      throw new Error(error.error || 'Speech generation failed');
    }
    
    const { audio } = await response.json();
    
    // Create and play audio
    const audioElement = new Audio(audio);
    
    return new Promise((resolve, reject) => {
      audioElement.onended = resolve;
      audioElement.onerror = reject;
      audioElement.play();
    });
  } catch (error) {
    console.error('Speech error:', error);
    throw error;
  }
}

// Usage
await speakText('Welcome to Tabby AI Keyboard', 'nova');

Source: nextjs-backend/src/app/api/speech/route.ts

Transcribe - Convert speech to text (STT)
Voice Agent - Real-time voice conversations
Chat - Generate text responses that can be spoken

Memory API

Backend API

Electron IPC

POST /api/speech

Request Body

Response

Example Request

Example Response

Voice Characteristics

Audio Format

Use Cases

Voice Responses

Notifications

Accessibility

Multilingual Support

Error Handling

400 Bad Request

500 Server Error

Performance Considerations

Integration Example

Build docs developers (and LLMs) love

Memory API

Backend API

Electron IPC

​POST /api/speech

​Request Body

​Response

​Example Request

​Example Response

​Voice Characteristics

​Audio Format

​Use Cases

Voice Responses

Notifications

Accessibility

Multilingual Support

​Error Handling

​400 Bad Request

​500 Server Error

​Performance Considerations

​Integration Example

​Related Endpoints

Build docs developers (and LLMs) love

POST /api/speech

Request Body

Response

Example Request

Example Response

Voice Characteristics

Audio Format

Use Cases

Error Handling

400 Bad Request

500 Server Error

Performance Considerations

Integration Example

Related Endpoints