Speech Features

LibreChat supports both speech-to-text (STT) and text-to-speech (TTS) features, enabling voice-based interactions with AI models. You can speak your prompts and have responses read aloud.

Overview

Speech features in LibreChat include:

Speech-to-Text (STT) - Convert spoken words to text input
Text-to-Speech (TTS) - Convert AI responses to spoken audio
Browser-based - Built-in browser APIs for basic functionality
External engines - Connect to advanced speech services
Conversation mode - Automatic hands-free back-and-forth interaction
Voice selection - Choose from available voices
Playback controls - Adjust speed and manage audio playback

Enabling Speech Features

Accessing Speech Settings

Open LibreChat settings (gear icon)
Navigate to the Speech tab
Configure STT and TTS options

Speech Settings Location

Settings > Speech contains all speech-related configuration options.

Speech-to-Text (STT)

Convert your spoken words into text for sending messages to the AI.

Enabling Speech-to-Text

Go to Settings > Speech > STT
Toggle Speech to Text switch to ON
The microphone button appears in the message input area

STT Engine Selection

Choose between available speech recognition engines:

Browser STT (Default)

Uses your browser’s built-in Web Speech API:

Pros: No setup required, works offline (on some browsers), free
Cons: Limited language support, accuracy varies by browser
Best for: Quick testing, simple voice input, privacy-conscious users

External STT

Connects to external speech recognition services:

Pros: Higher accuracy, more languages, better noise handling
Cons: Requires configuration, may have costs, needs internet connection
Best for: Production use, multilingual needs, professional applications

To select engine:

Go to Settings > Speech > STT
Find Engine dropdown
Select Browser or External
Save settings

Using Speech-to-Text

Ensure STT is enabled in settings
Click the microphone icon in the message input area
Allow microphone access if prompted by your browser
Start speaking
Your words appear as text in the input field in real-time
Click the microphone icon again to stop recording
Review and edit the transcribed text if needed
Send your message as usual

STT Visual Indicators

Microphone icon - Default state, ready to record
Red “Mic Off” icon - Currently recording
Spinner - Processing speech (external STT)

Language Selection

For multilingual speech recognition:

Go to Settings > Speech > STT
Find Language dropdown
Select your preferred language
Available languages depend on your selected engine

Common languages:

English (US, UK, Australian, etc.)
Spanish
French
German
Chinese
Japanese
And many more (varies by engine)

Advanced STT Settings

Auto-Transcribe Audio

Automatically transcribe when you finish speaking:

Go to Settings > Speech > STT
Toggle Auto-Transcribe Audio ON
Now when you stop speaking, transcription happens automatically

How it works:

Detects when you stop speaking (based on silence)
Automatically finalizes the transcription
You don’t need to click the mic button to stop

Auto-Send Text

Automatically send transcribed messages:

Go to Settings > Speech > STT
Enable Auto-Send Text
Choose when to auto-send:
- After transcription complete - Sends immediately when you stop speaking
- On manual confirmation - Waits for you to confirm

Use cases:

Hands-free conversation mode
Quick voice queries
Accessibility for users who can’t type

Decibel Threshold

Adjust microphone sensitivity:

Go to Settings > Speech > STT
Find Decibel Threshold slider
Lower values = more sensitive (picks up quieter sounds)
Higher values = less sensitive (requires louder speech)

Recommended settings:

Quiet environment: -40 to -50 dB (mid-range)
Noisy environment: -30 to -35 dB (less sensitive)
Soft-spoken: -50 to -60 dB (more sensitive)

Test different levels to find what works best for your setup.

Text-to-Speech (TTS)

Have AI responses read aloud with natural-sounding voices.

Enabling Text-to-Speech

Go to Settings > Speech > TTS
Toggle Text to Speech switch to ON
Volume icons appear next to AI messages

TTS Engine Selection

Browser TTS (Default)

Uses your browser’s built-in speech synthesis:

Pros: No setup, works offline, free, decent quality
Cons: Limited voice selection, quality varies by browser/OS
Best for: Basic TTS needs, testing, offline use

External TTS

Connects to external TTS services (e.g., OpenAI TTS, ElevenLabs, Google Cloud TTS):

Pros: High-quality voices, natural intonation, emotion, more languages
Cons: Requires API keys, may have costs, needs internet
Best for: Professional use, high-quality voice output, specific voice needs

To select engine:

Go to Settings > Speech > TTS
Find Engine dropdown
Select Browser or External
If external, configure API credentials
Save settings

Using Text-to-Speech

Ensure TTS is enabled in settings
After the AI responds to your message
Click the volume icon next to the response
Audio playback begins
Click the icon again to stop playback

TTS Visual Indicators

Volume icon - Ready to play, not currently playing
Volume Mute icon - Currently playing (click to stop)
Spinner - Loading/generating audio

Voice Selection

Browser Voices

Go to Settings > Speech > TTS
Find Voice dropdown
Select from available system voices
Voices vary by operating system:
- Windows: Microsoft voices
- macOS: Siri voices, others
- Linux: eSpeak, Festival voices
- Mobile: System-dependent

External Service Voices

Configure external TTS service
Go to Settings > Speech > TTS > Voice
Select from the service’s voice library
Options often include:
- Multiple languages
- Different genders
- Various speaking styles (news, conversational, etc.)
- Regional accents

Playback Controls

Playback Rate (Speed)

Adjust how fast the AI speaks:

Go to Settings > Speech > TTS
Find Playback Rate slider
Adjust speed:
- 0.5x - Half speed (slower)
- 1.0x - Normal speed
- 2.0x - Double speed (faster)
- Custom - Fine-tune to preference

Use cases:

Slower (0.7x-0.9x): Learning new languages, complex topics
Normal (1.0x): Standard listening
Faster (1.2x-1.5x): Efficiency, familiar content

Global Audio Control

Manage playback across all messages:

Pause all - Stop all currently playing audio
Resume - Continue paused playback
Skip - Move to next message in queue (if available)

Conversation Mode

Conversation mode enables continuous, hands-free voice interaction with the AI.

Enabling Conversation Mode

Go to Settings > Speech
Toggle Conversation Mode switch to ON
Both STT and TTS should be enabled

How Conversation Mode Works

You speak (STT transcribes)
Message auto-sends when you stop speaking
AI responds
Response is automatically read aloud (TTS)
After playback, mic activates for your next input
Cycle repeats for continuous conversation

Conversation Mode Requirements

Speech-to-Text: Enabled
Text-to-Speech: Enabled
Auto-Transcribe: Recommended (auto-detects when you finish speaking)
Auto-Send: Required (sends message automatically)

Starting a Conversation Mode Session

Enable conversation mode in settings
Start a new conversation or open existing one
Click the microphone icon
Begin speaking
The conversation flows automatically from there

Ending Conversation Mode

Click the microphone icon to stop listening
Turn off conversation mode in settings
Close the conversation

Conversation Mode Best Practices

Speak clearly - Better transcription accuracy
Pause between thoughts - Helps auto-transcribe detect when you’re done
Quiet environment - Reduces background noise interference
Test settings first - Adjust decibel threshold and playback rate before long sessions

Tips and Best Practices

Speech-to-Text

Test your microphone - Ensure it’s working properly before starting
Check browser permissions - Allow microphone access when prompted
Speak naturally - No need to speak robotically
Review transcriptions - Edit any errors before sending
Use punctuation commands - Some engines support saying “period”, “comma”, etc.
Background noise - Minimize for better accuracy

Text-to-Speech

Choose appropriate voice - Match language and preference
Adjust playback speed - Find comfortable listening pace
Use headphones - Better audio quality and privacy
Long responses - Consider playback rate adjustment for lengthy content
Pause and review - Stop playback to read complex code or data

Performance Optimization

Browser vs External - External services often have better quality but require internet
Disable when not needed - Turn off TTS/STT to save resources
Cache voices - Some browsers cache voices for faster loading
Test different engines - Compare quality and performance

Accessibility

Vision impairment - TTS enables accessing AI responses without reading
Motor impairment - Voice input as alternative to typing
Dyslexia - Hearing responses can aid comprehension
Multitasking - Listen to responses while doing other tasks

Troubleshooting

Speech-to-Text Issues

Microphone not working:

Check browser permissions for microphone access
Verify microphone is connected and not muted
Try a different browser
Check system audio settings

Poor transcription accuracy:

Speak more clearly and slowly
Reduce background noise
Adjust decibel threshold
Try external STT engine
Select correct language in settings

No speech detected:

Lower decibel threshold (increase sensitivity)
Check microphone volume levels
Ensure microphone isn’t muted or blocked
Try speaking louder or closer to mic

Speech cuts off too early:

Increase decibel threshold
Disable auto-transcribe and stop manually
Check browser STT settings

Text-to-Speech Issues

No audio playback:

Check browser audio permissions
Verify system volume isn’t muted
Try a different voice or engine
Check internet connection (for external TTS)
Look for browser console errors

Voice sounds robotic:

Try a different voice in settings
Consider using external TTS service for higher quality
Check if your browser has premium voices installed

Playback too fast/slow:

Adjust playback rate in settings
Reset to 1.0x if unsure
Test incremental changes (0.1x adjustments)

Audio stuttering or lag:

Check internet connection (for external TTS)
Close other tabs using audio
Try browser TTS instead of external
Reduce browser resource usage

General Speech Issues

Settings not saving:

Check browser local storage permissions
Try refreshing the page
Verify you’re logged in
Clear browser cache if persistent

Feature not available:

Speech features may be disabled by administrator
Browser may not support Web Speech API
External service may require configuration/API keys

Privacy and Security

Browser Speech APIs

Processing happens on your device (generally)
Some browsers may send audio to cloud services
Check your browser’s privacy policy
No API keys or external accounts needed

External Speech Services

Audio sent to third-party service
Subject to service provider’s privacy policy
May be recorded or used for training
Requires API credentials (stored securely)
Consider sensitivity of content being spoken/read

Best Practices

Review privacy policies of speech services you use
Avoid speaking sensitive information if using cloud STT
Use browser APIs for private/confidential conversations
Disable speech features when not needed
Check microphone permissions regularly

Keyboard Accessibility

Speech features are keyboard accessible:

Tab - Navigate to microphone/speaker buttons
Enter/Space - Activate speech recording or playback
Escape - Stop recording or playback
Navigate settings with keyboard

Advanced Configuration

Custom External Services

Administrators can configure custom STT/TTS endpoints:

API endpoint URLs
Authentication methods
Request/response formats
Voice/model selection

Check with your LibreChat administrator for available options.

Integration with Endpoints

Some LibreChat endpoints have native speech features:

OpenAI’s Whisper (STT) and TTS
Azure Speech Services
Google Cloud Speech

These may offer tighter integration and better quality than generic external services.

Get Started

Core Features

Configuration

Deployment

User Guide

Administration

​Speech Features

​Overview

​Enabling Speech Features

​Accessing Speech Settings

​Speech Settings Location

​Speech-to-Text (STT)

​Enabling Speech-to-Text

​STT Engine Selection

​Browser STT (Default)

​External STT

​Using Speech-to-Text

​STT Visual Indicators

​Language Selection

​Advanced STT Settings

​Auto-Transcribe Audio

​Auto-Send Text

​Decibel Threshold

​Text-to-Speech (TTS)

​Enabling Text-to-Speech

​TTS Engine Selection

​Browser TTS (Default)

​External TTS

​Using Text-to-Speech

​TTS Visual Indicators

​Voice Selection

​Browser Voices

​External Service Voices

​Playback Controls

​Playback Rate (Speed)

​Global Audio Control

​Conversation Mode

​Enabling Conversation Mode

​How Conversation Mode Works

​Conversation Mode Requirements

​Starting a Conversation Mode Session

​Ending Conversation Mode

​Conversation Mode Best Practices

​Tips and Best Practices

​Speech-to-Text

​Text-to-Speech

​Performance Optimization

​Accessibility

​Troubleshooting

​Speech-to-Text Issues

​Text-to-Speech Issues

​General Speech Issues

​Privacy and Security

​Browser Speech APIs

​External Speech Services

​Best Practices

​Keyboard Accessibility

​Advanced Configuration

​Custom External Services

​Integration with Endpoints

Build docs developers (and LLMs) love