Speech Features
LibreChat supports both speech-to-text (STT) and text-to-speech (TTS) features, enabling voice-based interactions with AI models. You can speak your prompts and have responses read aloud.Overview
Speech features in LibreChat include:- Speech-to-Text (STT) - Convert spoken words to text input
- Text-to-Speech (TTS) - Convert AI responses to spoken audio
- Browser-based - Built-in browser APIs for basic functionality
- External engines - Connect to advanced speech services
- Conversation mode - Automatic hands-free back-and-forth interaction
- Voice selection - Choose from available voices
- Playback controls - Adjust speed and manage audio playback
Enabling Speech Features
Accessing Speech Settings
- Open LibreChat settings (gear icon)
- Navigate to the Speech tab
- Configure STT and TTS options
Speech Settings Location
Settings > Speech contains all speech-related configuration options.Speech-to-Text (STT)
Convert your spoken words into text for sending messages to the AI.Enabling Speech-to-Text
- Go to Settings > Speech > STT
- Toggle Speech to Text switch to ON
- The microphone button appears in the message input area
STT Engine Selection
Choose between available speech recognition engines:Browser STT (Default)
Uses your browser’s built-in Web Speech API:- Pros: No setup required, works offline (on some browsers), free
- Cons: Limited language support, accuracy varies by browser
- Best for: Quick testing, simple voice input, privacy-conscious users
External STT
Connects to external speech recognition services:- Pros: Higher accuracy, more languages, better noise handling
- Cons: Requires configuration, may have costs, needs internet connection
- Best for: Production use, multilingual needs, professional applications
- Go to Settings > Speech > STT
- Find Engine dropdown
- Select Browser or External
- Save settings
Using Speech-to-Text
- Ensure STT is enabled in settings
- Click the microphone icon in the message input area
- Allow microphone access if prompted by your browser
- Start speaking
- Your words appear as text in the input field in real-time
- Click the microphone icon again to stop recording
- Review and edit the transcribed text if needed
- Send your message as usual
STT Visual Indicators
- Microphone icon - Default state, ready to record
- Red “Mic Off” icon - Currently recording
- Spinner - Processing speech (external STT)
Language Selection
For multilingual speech recognition:- Go to Settings > Speech > STT
- Find Language dropdown
- Select your preferred language
- Available languages depend on your selected engine
- English (US, UK, Australian, etc.)
- Spanish
- French
- German
- Chinese
- Japanese
- And many more (varies by engine)
Advanced STT Settings
Auto-Transcribe Audio
Automatically transcribe when you finish speaking:- Go to Settings > Speech > STT
- Toggle Auto-Transcribe Audio ON
- Now when you stop speaking, transcription happens automatically
- Detects when you stop speaking (based on silence)
- Automatically finalizes the transcription
- You don’t need to click the mic button to stop
Auto-Send Text
Automatically send transcribed messages:- Go to Settings > Speech > STT
- Enable Auto-Send Text
- Choose when to auto-send:
- After transcription complete - Sends immediately when you stop speaking
- On manual confirmation - Waits for you to confirm
- Hands-free conversation mode
- Quick voice queries
- Accessibility for users who can’t type
Decibel Threshold
Adjust microphone sensitivity:- Go to Settings > Speech > STT
- Find Decibel Threshold slider
- Lower values = more sensitive (picks up quieter sounds)
- Higher values = less sensitive (requires louder speech)
- Quiet environment: -40 to -50 dB (mid-range)
- Noisy environment: -30 to -35 dB (less sensitive)
- Soft-spoken: -50 to -60 dB (more sensitive)
Text-to-Speech (TTS)
Have AI responses read aloud with natural-sounding voices.Enabling Text-to-Speech
- Go to Settings > Speech > TTS
- Toggle Text to Speech switch to ON
- Volume icons appear next to AI messages
TTS Engine Selection
Browser TTS (Default)
Uses your browser’s built-in speech synthesis:- Pros: No setup, works offline, free, decent quality
- Cons: Limited voice selection, quality varies by browser/OS
- Best for: Basic TTS needs, testing, offline use
External TTS
Connects to external TTS services (e.g., OpenAI TTS, ElevenLabs, Google Cloud TTS):- Pros: High-quality voices, natural intonation, emotion, more languages
- Cons: Requires API keys, may have costs, needs internet
- Best for: Professional use, high-quality voice output, specific voice needs
- Go to Settings > Speech > TTS
- Find Engine dropdown
- Select Browser or External
- If external, configure API credentials
- Save settings
Using Text-to-Speech
- Ensure TTS is enabled in settings
- After the AI responds to your message
- Click the volume icon next to the response
- Audio playback begins
- Click the icon again to stop playback
TTS Visual Indicators
- Volume icon - Ready to play, not currently playing
- Volume Mute icon - Currently playing (click to stop)
- Spinner - Loading/generating audio
Voice Selection
Browser Voices
- Go to Settings > Speech > TTS
- Find Voice dropdown
- Select from available system voices
- Voices vary by operating system:
- Windows: Microsoft voices
- macOS: Siri voices, others
- Linux: eSpeak, Festival voices
- Mobile: System-dependent
External Service Voices
- Configure external TTS service
- Go to Settings > Speech > TTS > Voice
- Select from the service’s voice library
- Options often include:
- Multiple languages
- Different genders
- Various speaking styles (news, conversational, etc.)
- Regional accents
Playback Controls
Playback Rate (Speed)
Adjust how fast the AI speaks:- Go to Settings > Speech > TTS
- Find Playback Rate slider
- Adjust speed:
- 0.5x - Half speed (slower)
- 1.0x - Normal speed
- 2.0x - Double speed (faster)
- Custom - Fine-tune to preference
- Slower (0.7x-0.9x): Learning new languages, complex topics
- Normal (1.0x): Standard listening
- Faster (1.2x-1.5x): Efficiency, familiar content
Global Audio Control
Manage playback across all messages:- Pause all - Stop all currently playing audio
- Resume - Continue paused playback
- Skip - Move to next message in queue (if available)
Conversation Mode
Conversation mode enables continuous, hands-free voice interaction with the AI.Enabling Conversation Mode
- Go to Settings > Speech
- Toggle Conversation Mode switch to ON
- Both STT and TTS should be enabled
How Conversation Mode Works
- You speak (STT transcribes)
- Message auto-sends when you stop speaking
- AI responds
- Response is automatically read aloud (TTS)
- After playback, mic activates for your next input
- Cycle repeats for continuous conversation
Conversation Mode Requirements
- Speech-to-Text: Enabled
- Text-to-Speech: Enabled
- Auto-Transcribe: Recommended (auto-detects when you finish speaking)
- Auto-Send: Required (sends message automatically)
Starting a Conversation Mode Session
- Enable conversation mode in settings
- Start a new conversation or open existing one
- Click the microphone icon
- Begin speaking
- The conversation flows automatically from there
Ending Conversation Mode
- Click the microphone icon to stop listening
- Turn off conversation mode in settings
- Close the conversation
Conversation Mode Best Practices
- Speak clearly - Better transcription accuracy
- Pause between thoughts - Helps auto-transcribe detect when you’re done
- Quiet environment - Reduces background noise interference
- Test settings first - Adjust decibel threshold and playback rate before long sessions
Tips and Best Practices
Speech-to-Text
- Test your microphone - Ensure it’s working properly before starting
- Check browser permissions - Allow microphone access when prompted
- Speak naturally - No need to speak robotically
- Review transcriptions - Edit any errors before sending
- Use punctuation commands - Some engines support saying “period”, “comma”, etc.
- Background noise - Minimize for better accuracy
Text-to-Speech
- Choose appropriate voice - Match language and preference
- Adjust playback speed - Find comfortable listening pace
- Use headphones - Better audio quality and privacy
- Long responses - Consider playback rate adjustment for lengthy content
- Pause and review - Stop playback to read complex code or data
Performance Optimization
- Browser vs External - External services often have better quality but require internet
- Disable when not needed - Turn off TTS/STT to save resources
- Cache voices - Some browsers cache voices for faster loading
- Test different engines - Compare quality and performance
Accessibility
- Vision impairment - TTS enables accessing AI responses without reading
- Motor impairment - Voice input as alternative to typing
- Dyslexia - Hearing responses can aid comprehension
- Multitasking - Listen to responses while doing other tasks
Troubleshooting
Speech-to-Text Issues
Microphone not working:- Check browser permissions for microphone access
- Verify microphone is connected and not muted
- Try a different browser
- Check system audio settings
- Speak more clearly and slowly
- Reduce background noise
- Adjust decibel threshold
- Try external STT engine
- Select correct language in settings
- Lower decibel threshold (increase sensitivity)
- Check microphone volume levels
- Ensure microphone isn’t muted or blocked
- Try speaking louder or closer to mic
- Increase decibel threshold
- Disable auto-transcribe and stop manually
- Check browser STT settings
Text-to-Speech Issues
No audio playback:- Check browser audio permissions
- Verify system volume isn’t muted
- Try a different voice or engine
- Check internet connection (for external TTS)
- Look for browser console errors
- Try a different voice in settings
- Consider using external TTS service for higher quality
- Check if your browser has premium voices installed
- Adjust playback rate in settings
- Reset to 1.0x if unsure
- Test incremental changes (0.1x adjustments)
- Check internet connection (for external TTS)
- Close other tabs using audio
- Try browser TTS instead of external
- Reduce browser resource usage
General Speech Issues
Settings not saving:- Check browser local storage permissions
- Try refreshing the page
- Verify you’re logged in
- Clear browser cache if persistent
- Speech features may be disabled by administrator
- Browser may not support Web Speech API
- External service may require configuration/API keys
Privacy and Security
Browser Speech APIs
- Processing happens on your device (generally)
- Some browsers may send audio to cloud services
- Check your browser’s privacy policy
- No API keys or external accounts needed
External Speech Services
- Audio sent to third-party service
- Subject to service provider’s privacy policy
- May be recorded or used for training
- Requires API credentials (stored securely)
- Consider sensitivity of content being spoken/read
Best Practices
- Review privacy policies of speech services you use
- Avoid speaking sensitive information if using cloud STT
- Use browser APIs for private/confidential conversations
- Disable speech features when not needed
- Check microphone permissions regularly
Keyboard Accessibility
Speech features are keyboard accessible:- Tab - Navigate to microphone/speaker buttons
- Enter/Space - Activate speech recording or playback
- Escape - Stop recording or playback
- Navigate settings with keyboard
Advanced Configuration
Custom External Services
Administrators can configure custom STT/TTS endpoints:- API endpoint URLs
- Authentication methods
- Request/response formats
- Voice/model selection
Integration with Endpoints
Some LibreChat endpoints have native speech features:- OpenAI’s Whisper (STT) and TTS
- Azure Speech Services
- Google Cloud Speech