Skip to main content
Iqra AI supports 18 text-to-speech (TTS) providers through the ITTSService interface. All providers deliver audio in real-time streaming formats optimized for telephony and WebRTC channels.

Supported providers

The platform includes native integrations for:

ElevenLabs

Industry-leading voice cloning and multilingual synthesis

Azure Speech

Microsoft’s neural TTS with 400+ voices

Deepgram

Ultra-low latency streaming TTS

Cartesia

Expressive conversational voices

Google TTS

WaveNet and Neural2 voices

FishAudio

High-quality voice synthesis

Minimax

Advanced Chinese language support

HumeAI

Emotionally intelligent speech

Inworld

Character voices for gaming

Speechify

Natural reading voices

MurfAI

Studio-quality voiceovers

Neuphonic

Neural voice generation

ResembleAI

Real-time voice cloning

Rime

Expressive speech synthesis

Sarvam

Indic language specialist

UpliftAI

Enterprise TTS platform

HamsaAI

Arabic language optimization

Zyphra Zonos

Fast multilingual synthesis

ElevenLabs Text to Speech

Provider ID: ElevenLabsTextToSpeech
Implementation: ElevenLabsTTSService.cs
Industry-leading voice cloning with support for 30+ languages and ultra-realistic prosody.

Configuration fields

FieldTypeRequiredDescription
apiKeypasswordYesElevenLabs API key from elevenlabs.io
voiceIdtextYesVoice identifier (e.g., 21m00Tcm4TlvDq8ikWAM)
modelIdselectNoModel: eleven_multilingual_v2, eleven_turbo_v2_5
stabilitynumberNoVoice consistency (0.0-1.0, default: 0.5)
similarityBoostnumberNoVoice clarity (0.0-1.0, default: 0.75)
stylenumberNoExaggeration level (0.0-1.0)
useSpeakerBoostbooleanNoEnhance clarity (recommended: true)
speednumberNoPlayback speed (0.5-2.0)
pronunciationDictionaryIdsarrayNoCustom pronunciation dictionaries
applyTextNormalizationselectNoauto, on, or off
{
  "voiceId": "21m00Tcm4TlvDq8ikWAM",
  "modelId": "eleven_turbo_v2_5",
  "stability": 0.5,
  "similarityBoost": 0.8,
  "useSpeakerBoost": true,
  "speed": 1.0,
  "applyTextNormalization": "auto"
}
Use eleven_turbo_v2_5 for real-time conversations (lowest latency) and eleven_multilingual_v2 for maximum voice quality in non-English languages.

Finding voice IDs

  1. Go to https://elevenlabs.io/voice-library
  2. Select a voice or clone your own
  3. Copy the voice ID from the URL or API settings

Pronunciation dictionaries

Create custom dictionaries in the ElevenLabs dashboard to handle:
  • Brand names and acronyms
  • Technical terminology
  • Non-standard pronunciations
  • Regional variations
Add dictionary IDs to the pronunciationDictionaryIds array.

Implementation details

Interface contract

public interface ITTSService
{
    Task<FunctionReturnResult> Initialize();
    Task<FunctionReturnResult<byte[]>> TextToSpeechAsync(string text, 
                                                          CancellationToken cancellationToken);
    Task<FunctionReturnResult<Stream>> TextToSpeechStreamAsync(string text, 
                                                                CancellationToken cancellationToken);
}

Audio format handling

Iqra AI automatically handles format conversion:
  1. Provider native format - Each TTS service outputs in its preferred format
  2. Format detection - System identifies optimal format (PCM, μ-law, Opus, etc.)
  3. Automatic conversion - Converts to telephony format (8kHz μ-law) or WebRTC (16kHz Opus)
  4. Streaming delivery - Chunks audio for minimal latency
See TTSProviderManager.cs:1-50 for implementation.

Caching system

The TTSAudioCacheManager optimizes repeated phrases:
  • Cache key generation - Hash of text + voice + config
  • S3 storage - Persistent cache in RustFS
  • TTL management - Configurable expiration
  • Cache invalidation - Automatic on config changes
This dramatically reduces latency and costs for common responses.

Provider selection guide

Recommended providers:
  1. Deepgram - Sub-250ms first chunk
  2. ElevenLabs Turbo - ~300ms latency
  3. Cartesia - Optimized for streaming
Use μ-law encoding at 8kHz for telephony.

Adding custom providers

To integrate a new TTS provider:
  1. Add enum value in IqraCore/Entities/Interfaces/InterfaceTTSProviderEnum.cs
  2. Implement interface in IqraInfrastructure/Managers/TTS/Providers/
  3. Handle audio formats using TTSProviderAvailableAudioFormat
  4. Return streaming data via Stream or byte[]
  5. Restart application for auto-registration
See ElevenLabsTTSService.cs:19-71 for reference implementation.

Next steps

Configure STT

Add speech-to-text for input processing

Multi-language agents

Configure parallel language contexts

Voice settings

Fine-tune voice parameters per agent

Telephony integration

Deploy via phone providers

Build docs developers (and LLMs) love