Create production-ready voice AI applications with complete STT-LLM-TTS pipelines and calling integrations. These tools handle the hard parts of real-time audio so you can focus on the experience.
LiveKit Open-source real-time voice AI framework
Twilio Calling, SMS, and OTP integration
LiveKit
LiveKit is an open-source WebRTC framework for building real-time voice AI applications. It provides a complete STT-LLM-TTS pipeline with turn detection and interruption handling - the industry standard for voice-first applications.
Key features
Complete voice pipeline - STT → LLM → TTS with seamless integration
WebRTC-based streaming - Real-time, low-latency audio/video
Turn detection - Automatically detects when users stop speaking
Interruption handling - Users can interrupt the AI mid-response
Multi-participant rooms - Multiple users can join the same session
Provider plugins - OpenAI, AssemblyAI, Deepgram, ElevenLabs, and more
Python & JavaScript SDKs - Build agents in your preferred language
Self-hosted or cloud - Deploy anywhere
Architecture overview
LiveKit uses a room-based architecture where both users and AI agents join as participants:
┌─────────────┐ ┌──────────────┐
│ Browser │◄───────►│ LiveKit │
│ (User) │ WebRTC │ Room │
└─────────────┘ └──────┬───────┘
│
┌──────▼───────┐
│ Python │
│ Agent │
│ (STT→LLM │
│ →TTS) │
└──────────────┘
User joins LiveKit room
Browser or mobile app connects to LiveKit room via WebRTC.
Python agent joins same room
Agent joins as a participant with audio capabilities.
Audio streams bidirectionally
User speech streams to agent, agent responses stream back.
Agent processes in real-time
STT converts speech to text, LLM generates response, TTS converts to audio.
Events sync UI instantly
Structured events (transcripts, responses) update the interface in real-time.
Provider plugins
LiveKit supports plugins for major AI providers:
Speech-to-Text
Language Models
Text-to-Speech
Deepgram - Fast, accurate transcription
AssemblyAI - High accuracy with punctuation
OpenAI Whisper - Open-source option
Google Cloud Speech - 120+ languages
Azure Speech - Enterprise support
OpenAI - GPT-3.5, GPT-4, GPT-4o
Anthropic - Claude Sonnet, Opus
Google - Gemini Pro, Ultra
Local models - Ollama, LM Studio
Custom APIs - Any HTTP endpoint
ElevenLabs - Ultra-realistic voices
OpenAI TTS - Natural-sounding speech
Azure TTS - Neural voices
Google Cloud TTS - WaveNet voices
Deepgram - Low-latency synthesis
Quick start
Install the LiveKit agents SDK:
pip install livekit livekit-agents livekit-plugins-openai livekit-plugins-deepgram livekit-plugins-elevenlabs
Basic agent example
Simple voice agent
Custom instructions
Handle events
Multi-language support
import asyncio
from livekit import rtc
from livekit.agents import JobContext, WorkerOptions, cli
from livekit.agents.voice import VoicePipeline
from livekit.plugins import openai, deepgram, elevenlabs
async def entrypoint ( ctx : JobContext):
# Connect to the room
await ctx.connect()
# Create voice pipeline
pipeline = VoicePipeline(
stt = deepgram.STT(),
llm = openai.LLM( model = "gpt-4" ),
tts = elevenlabs.TTS( voice = "Rachel" ),
)
# Start the pipeline
pipeline.start(ctx.room)
# Keep the agent alive
await asyncio.sleep( 3600 )
if __name__ == "__main__" :
cli.run_app(WorkerOptions( entrypoint_fnc = entrypoint))
Advanced features
Turn detection and interruption
LiveKit automatically detects when users stop speaking (turn detection) and allows users to interrupt the AI mid-response. No manual configuration needed - it just works.
Multiple users can join the same room and talk to the same agent, or multiple agents can be in one room. Perfect for group conversations or agent collaboration.
Send structured data (JSON) from agent to frontend for real-time UI updates: # Send custom data to room
await ctx.room.local_participant.publish_data(
json.dumps({ "event" : "booking_confirmed" , "id" : "12345" }),
topic = "app_events"
)
Use cases
Customer service bots Handle support calls with AI agents that can look up orders, process refunds, or escalate to humans.
Interview automation Conduct initial screening interviews, collect candidate information, and schedule follow-ups.
Voice ordering Take orders over the phone for restaurants, retail, or services with natural conversation.
Real-time transcription Live captioning for meetings, lectures, or accessibility features.
Industry-standard for voice AI - Used in production by companies building voice-first applications. Handles all the hard parts of real-time audio.
Twilio
Twilio provides APIs for calling, SMS, WhatsApp, and OTP verification. With a free tier offering 1 phone number and $15 credits, it’s perfect for adding telephony to your hackathon project.
Key features
Free tier - 1 free phone number + $15 credits for new accounts
Test credentials - Development mode without charges
Voice calling - Inbound and outbound phone calls
SMS messaging - Send and receive text messages
WhatsApp - Integrate WhatsApp messaging
OTP verification - Phone number verification codes
LiveKit integration - Connect calls to voice agents seamlessly
Pricing
Free tier
1 free phone number
$15 trial credits
Test credentials for development
Enough for entire hackathon
Pay-as-you-go
Voice: ~$0.013/min
SMS: ~$0.0075/message
Phone numbers: $1/month
No monthly minimums
Quick start
Install the Twilio SDK:
Code examples
Make a call
Send SMS
TwiML response
OTP verification
from twilio.rest import Client
client = Client(account_sid, auth_token)
call = client.calls.create(
to = "+1234567890" ,
from_ = "+0987654321" ,
url = "http://your-webhook.com/voice" , # TwiML instructions
method = "POST"
)
print ( f "Call SID: { call.sid } " )
print ( f "Status: { call.status } " )
Integrate Twilio with LiveKit
Connect phone calls to your LiveKit voice agents:
Twilio receives call
User calls your Twilio phone number.
Webhook creates LiveKit room
Your server creates a new LiveKit room and generates join token.
Connect caller to LiveKit
Use TwiML <Stream> to send audio from Twilio to LiveKit.
Agent joins room
Your LiveKit agent joins the same room as the caller.
Bidirectional audio
Audio streams between caller and agent in real-time.
Twilio → LiveKit webhook
Start LiveKit agent for call
from flask import Flask, request
from twilio.twiml.voice_response import VoiceResponse, Connect, Stream
from livekit import api
import os
app = Flask( __name__ )
@app.route ( "/voice" , methods = [ 'POST' ])
def voice ():
# Create LiveKit room and token
room_name = f "call- { request.values.get( 'CallSid' ) } "
# Generate token for the caller
token = api.AccessToken(
os.environ[ 'LIVEKIT_API_KEY' ],
os.environ[ 'LIVEKIT_API_SECRET' ]
)
token.with_identity(request.values.get( 'From' ))
token.with_grants(api.VideoGrants(
room_join = True ,
room = room_name
))
# Create TwiML response
response = VoiceResponse()
response.say( "Connecting you to our AI assistant." )
# Stream audio to LiveKit
connect = Connect()
stream = Stream(
url = f "wss://your-livekit-server.com?token= { token.to_jwt() } "
)
connect.append(stream)
response.append(connect)
return str (response)
Use cases
Set up a phone number that connects callers to your LiveKit voice agent. Perfect for customer support, order taking, or appointment scheduling.
Send alerts, confirmations, or updates to users via text message. Great for delivery updates, appointment reminders, or emergency alerts.
Verify user phone numbers with OTP codes. Essential for security-critical apps or preventing fake accounts.
Build chatbots or send notifications through WhatsApp Business API. Higher engagement than email for many demographics.
Free credits are enough - $15 credits last the entire hackathon. Test extensively without worrying about costs.
Best practices
Voice agent design
Keep responses concise - Voice UIs are different from text. Aim for 1-2 sentence responses. Users can’t scroll back.
Design for interruption - Users should be able to interrupt the agent naturally
Provide escape hatches - Always offer “say ‘operator’ to speak to a human”
Use conversation markers - “Got it”, “One moment”, “Let me check that”
Avoid long monologues - Break information into chunks with confirmations
Test latency - Aim for under 1 second response time or users get frustrated
Production deployment
Monitor latency
Track STT, LLM, and TTS latency separately. Optimize the slowest component.
Handle errors gracefully
“Sorry, I didn’t catch that” is better than silence or errors.
Log conversations
Store transcripts for debugging and improving prompts.
Implement fallbacks
If agent fails, transfer to human or offer callback.
Test edge cases
Background noise, accents, interruptions, silence.
Cost optimization
Development
Use Twilio test credentials
LiveKit self-hosted (free)
Free STT: OpenAI Whisper
Free LLM: Local models
Demo day
Use trial credits
Cache common responses
Set call time limits
Monitor usage in real-time
Latency optimization
Target: Under 1 second end-to-end latency - From user finishing speech to agent starting response.
STT : Deepgram or AssemblyAI (fastest options)
LLM : GPT-3.5-turbo or Claude Instant (fast, good quality)
TTS : ElevenLabs Turbo v2 or OpenAI TTS-1 (lowest latency)
Streaming : Enable streaming for all components when possible
Example: Complete voice agent
Put it all together - Twilio phone number connected to LiveKit agent:
# app.py - Flask webhook server
from flask import Flask, request
from twilio.twiml.voice_response import VoiceResponse, Connect, Stream
import os
app = Flask( __name__ )
@app.route ( "/voice" , methods = [ 'POST' ])
def voice ():
"""Handle incoming Twilio calls"""
call_sid = request.values.get( 'CallSid' )
caller = request.values.get( 'From' )
response = VoiceResponse()
response.say(
"Thank you for calling. Connecting you to our AI assistant." ,
voice = "Polly.Joanna"
)
# Stream audio to LiveKit
connect = Connect()
stream = Stream(
url = f "wss://your-livekit.com?room=call- { call_sid } &caller= { caller } "
)
connect.append(stream)
response.append(connect)
return str (response)
if __name__ == "__main__" :
app.run( port = 5000 )
# agent.py - LiveKit voice agent
import asyncio
from livekit.agents import JobContext, WorkerOptions, cli
from livekit.agents.voice import VoicePipeline
from livekit.plugins import openai, deepgram, elevenlabs
async def entrypoint ( ctx : JobContext):
await ctx.connect()
# Get caller info from room metadata
caller = ctx.room.metadata.get( 'caller' , 'unknown' )
# Create voice pipeline with business logic
pipeline = VoicePipeline(
stt = deepgram.STT( language = "en-US" ),
llm = openai.LLM(
model = "gpt-4" ,
temperature = 0.7 ,
instructions = f """
You are a customer service agent for TechCorp.
The caller's number is { caller } .
Be friendly, professional, and helpful.
Keep responses under 2 sentences.
If asked about pricing, quotes start at $99.
For technical issues, collect: name, email, and issue description.
"""
),
tts = elevenlabs.TTS(
voice = "Rachel" ,
model = "eleven_turbo_v2"
)
)
pipeline.start(ctx.room)
await asyncio.sleep( 600 ) # 10 minute max call time
if __name__ == "__main__" :
cli.run_app(WorkerOptions( entrypoint_fnc = entrypoint))
Deploy this and you have a fully functional AI phone agent in under 100 lines of code!