Speech & Audio

Overview

LibreChat supports bidirectional voice interactions through Text-to-Speech (TTS) for reading AI responses aloud and Speech-to-Text (STT) for voice input. This enables hands-free conversations and accessibility features.

Text-to-Speech (TTS)

Have AI responses read aloud with natural-sounding voices.

Supported TTS Providers

Browser TTS (Default)
OpenAI TTS
Custom TTS Provider

Uses the browser’s built-in speech synthesis:

No configuration required
Works offline
Voice quality depends on browser/OS
No API costs

Automatically available in all conversations.

High-quality voices from OpenAI:

# librechat.yaml
speech:
  tts:
    openai:
      url: ''  # Optional custom endpoint
      apiKey: '${TTS_API_KEY}'
      model: 'tts-1'  # or 'tts-1-hd'
      voices: ['alloy', 'echo', 'fable', 'onyx', 'nova', 'shimmer']

Available voices:

alloy: Neutral, balanced
echo: Male, clear
fable: British accent, expressive
onyx: Deep, authoritative
nova: Energetic, youthful
shimmer: Warm, friendly

Use any OpenAI-compatible TTS API:

speech:
  tts:
    openai:
      url: 'https://your-tts-provider.com/v1'
      apiKey: '${TTS_API_KEY}'
      model: 'your-model'
      voices: ['voice1', 'voice2']

Using TTS

Enable TTS

Look for the speaker icon in the message header or message footer.

Click Read Aloud

Click the speaker icon to have the message read aloud.

Control Playback

Pause/Resume: Click the icon again
Stop: Click the mute icon
Adjust speed: Use the playback rate control in settings

Playback Speed

Control audio playback rate:

// Recoil state for playback speed
playbackRate: 1.0  // Range: 0.5 to 2.0

Adjust in user settings:

0.5x: Slower (better comprehension)
1.0x: Normal speed
1.5x: Faster
2.0x: Maximum speed

Auto-Play

Configure automatic TTS for new messages:

# Feature currently user-controlled via settings
# Auto-play last message when enabled

Speech-to-Text (STT)

Use your voice as input instead of typing.

Supported STT Providers

Browser STT
OpenAI Whisper

Uses browser’s built-in speech recognition:

No configuration required
Works with Chrome, Edge, Safari
Limited browser support
Requires microphone permission

High-accuracy transcription via OpenAI Whisper:

# librechat.yaml
speech:
  stt:
    openai:
      url: ''  # Optional custom endpoint
      apiKey: '${STT_API_KEY}'
      model: 'whisper-1'

Features:

Multi-language support
High accuracy
Handles accents and noise well
API-based (requires costs)

Using STT

Enable Microphone

Grant microphone permission when prompted by your browser.

Click Microphone Icon

Find the microphone button in the message input area.

Speak Your Message

Speak clearly into your microphone. The transcription appears in real-time.

Send or Edit

Click Send to submit the transcription
Edit the text before sending if needed

For best results with STT:

Use a quality microphone
Minimize background noise
Speak at a normal pace
Enunciate clearly

Configuration

Environment Variables

# .env
STT_API_KEY=your-openai-key-for-stt
TTS_API_KEY=your-openai-key-for-tts

Complete Speech Configuration

# librechat.yaml
speech:
  # Text-to-Speech
  tts:
    openai:
      url: 'https://api.openai.com/v1'  # Optional
      apiKey: '${TTS_API_KEY}'
      model: 'tts-1-hd'  # or 'tts-1' for faster/cheaper
      voices:
        - 'alloy'
        - 'echo'
        - 'fable'
        - 'onyx'
        - 'nova'
        - 'shimmer'
  
  # Speech-to-Text
  stt:
    openai:
      url: 'https://api.openai.com/v1'  # Optional
      apiKey: '${STT_API_KEY}'
      model: 'whisper-1'

Audio Features

Audio Element

TTS uses HTML5 audio elements:

// Audio playback with controls
<audio
  id={`audio-${messageId}`}
  ref={audioRef}
  hidden
  preload="none"
>
  <source src={audioUrl} type="audio/mpeg" />
</audio>

Voice Selection

Choose from available TTS voices:

// Voice selector component
<Voices
  voices={['alloy', 'echo', 'fable', 'onyx', 'nova', 'shimmer']}
  selectedVoice={userPreference}
  onChange={handleVoiceChange}
/>

Rate Limiting

Control speech API usage:

# .env
TTS_VIOLATION_SCORE=0  # No rate limiting by default
STT_VIOLATION_SCORE=0  # Adjust as needed

TTS and STT API calls can add up quickly. Monitor usage and set appropriate rate limits.

Accessibility

Speech features enhance accessibility:

Screen reader friendly: ARIA labels on all controls
Keyboard navigation: Full keyboard support
Visual feedback: Clear indication of recording/playback state
Captions: Transcriptions appear as text

// Accessibility attributes
aria-label="Read aloud"
aria-haspopup="false"
title="Click to read this message"

Browser Compatibility

TTS Support
STT Support

Chrome: Full support (built-in + API)
Firefox: Built-in only
Safari: Built-in only
Edge: Full support
Mobile: Limited (iOS Safari, Chrome Android)

Performance Optimization

Audio Caching

TTS audio can be cached to reduce API calls:

// Audio source caching
const audioCache = new Map<string, string>();

Lazy Loading

Audio elements load only when needed:

<audio preload="none" />

Throttling

Prevent spam by throttling requests:

// Rate limiting for TTS/STT
limit: 40,
window: 60000  // 1 minute

Use Cases

Accessibility

Screen reader users
Visual impairments
Reading difficulties
Language learning

Hands-Free Operation

Driving
Cooking
Multitasking
Mobile usage

Content Consumption

Long-form content
Educational material
News summaries
Podcast-style listening

Voice Input

Faster than typing
Mobile convenience
Accessibility
Multilingual input

Troubleshooting

TTS not working

Check API key configuration
Verify browser supports audio playback
Check volume/mute settings
Look for errors in browser console

No sound output

Check device volume
Verify audio output device
Test browser audio (e.g., YouTube)
Check for browser audio permission

STT not recognizing speech

Grant microphone permission
Check microphone is working (test in another app)
Reduce background noise
Speak clearly and at moderate speed
Try refreshing the page

Poor voice quality

Use tts-1-hd model for better quality
Check network connection
Try different voice options
For browser TTS, quality depends on OS

High API costs

Use browser TTS instead of API
Limit TTS to important messages
Set rate limits
Monitor usage in OpenAI dashboard

Best Practices

Default to browser TTS: Lower costs, works offline
Use API TTS for quality: When professional voice matters
Enable selectively: Don’t auto-play all messages
Optimize voice choice: Match voice to use case
Monitor costs: TTS/STT can be expensive at scale
Provide text fallback: Always show text alongside audio

Configuration Reference

# librechat.yaml
speech:
  tts:
    openai:
      url: '${TTS_BASE_URL}'
      apiKey: '${TTS_API_KEY}'
      model: 'tts-1-hd'
      voices: ['alloy', 'echo', 'fable', 'onyx', 'nova', 'shimmer']
  
  stt:
    openai:
      url: '${STT_BASE_URL}'
      apiKey: '${STT_API_KEY}'
      model: 'whisper-1'

# .env
TTS_API_KEY=your-tts-key
STT_API_KEY=your-stt-key

# Rate limiting
TTS_VIOLATION_SCORE=0
STT_VIOLATION_SCORE=0

Get Started

Core Features

Configuration

Deployment

User Management

Overview

Text-to-Speech (TTS)

Supported TTS Providers

Using TTS

Playback Speed

Auto-Play

Speech-to-Text (STT)

Supported STT Providers

Using STT

Configuration

Environment Variables

Complete Speech Configuration

Audio Features

Audio Element

Voice Selection

Rate Limiting

Accessibility

Browser Compatibility

Performance Optimization

Audio Caching

Lazy Loading

Throttling

Use Cases

Troubleshooting

Best Practices

Configuration Reference

Build docs developers (and LLMs) love

Get Started

Core Features

Configuration

Deployment

User Management

​Overview

​Text-to-Speech (TTS)

​Supported TTS Providers

​Using TTS

​Playback Speed

​Auto-Play

​Speech-to-Text (STT)

​Supported STT Providers

​Using STT

​Configuration

​Environment Variables

​Complete Speech Configuration

​Audio Features

​Audio Element

​Voice Selection

​Rate Limiting

​Accessibility

​Browser Compatibility

​Performance Optimization

​Audio Caching

​Lazy Loading

​Throttling

​Use Cases

​Troubleshooting

​Best Practices

​Configuration Reference

​Related Features

Build docs developers (and LLMs) love

Overview

Text-to-Speech (TTS)

Supported TTS Providers

Using TTS

Playback Speed

Auto-Play

Speech-to-Text (STT)

Supported STT Providers

Using STT

Configuration

Environment Variables

Complete Speech Configuration

Audio Features

Audio Element

Voice Selection

Rate Limiting

Accessibility

Browser Compatibility

Performance Optimization

Audio Caching

Lazy Loading

Throttling

Use Cases

Troubleshooting

Best Practices

Configuration Reference

Related Features