Skip to main content
Create production-ready voice AI applications with complete STT-LLM-TTS pipelines and calling integrations. These tools handle the hard parts of real-time audio so you can focus on the experience.

LiveKit

Open-source real-time voice AI framework

Twilio

Calling, SMS, and OTP integration

LiveKit

LiveKit is an open-source WebRTC framework for building real-time voice AI applications. It provides a complete STT-LLM-TTS pipeline with turn detection and interruption handling - the industry standard for voice-first applications.

Key features

  • Complete voice pipeline - STT → LLM → TTS with seamless integration
  • WebRTC-based streaming - Real-time, low-latency audio/video
  • Turn detection - Automatically detects when users stop speaking
  • Interruption handling - Users can interrupt the AI mid-response
  • Multi-participant rooms - Multiple users can join the same session
  • Provider plugins - OpenAI, AssemblyAI, Deepgram, ElevenLabs, and more
  • Python & JavaScript SDKs - Build agents in your preferred language
  • Self-hosted or cloud - Deploy anywhere

Architecture overview

LiveKit uses a room-based architecture where both users and AI agents join as participants:
┌─────────────┐         ┌──────────────┐
│   Browser   │◄───────►│  LiveKit     │
│   (User)    │  WebRTC │  Room        │
└─────────────┘         └──────┬───────┘

                        ┌──────▼───────┐
                        │  Python      │
                        │  Agent       │
                        │  (STT→LLM    │
                        │   →TTS)      │
                        └──────────────┘
1

User joins LiveKit room

Browser or mobile app connects to LiveKit room via WebRTC.
2

Python agent joins same room

Agent joins as a participant with audio capabilities.
3

Audio streams bidirectionally

User speech streams to agent, agent responses stream back.
4

Agent processes in real-time

STT converts speech to text, LLM generates response, TTS converts to audio.
5

Events sync UI instantly

Structured events (transcripts, responses) update the interface in real-time.

Provider plugins

LiveKit supports plugins for major AI providers:
  • Deepgram - Fast, accurate transcription
  • AssemblyAI - High accuracy with punctuation
  • OpenAI Whisper - Open-source option
  • Google Cloud Speech - 120+ languages
  • Azure Speech - Enterprise support

Quick start

Install the LiveKit agents SDK:
pip install livekit livekit-agents livekit-plugins-openai livekit-plugins-deepgram livekit-plugins-elevenlabs

Basic agent example

import asyncio
from livekit import rtc
from livekit.agents import JobContext, WorkerOptions, cli
from livekit.agents.voice import VoicePipeline
from livekit.plugins import openai, deepgram, elevenlabs

async def entrypoint(ctx: JobContext):
    # Connect to the room
    await ctx.connect()
    
    # Create voice pipeline
    pipeline = VoicePipeline(
        stt=deepgram.STT(),
        llm=openai.LLM(model="gpt-4"),
        tts=elevenlabs.TTS(voice="Rachel"),
    )
    
    # Start the pipeline
    pipeline.start(ctx.room)
    
    # Keep the agent alive
    await asyncio.sleep(3600)

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

Advanced features

LiveKit automatically detects when users stop speaking (turn detection) and allows users to interrupt the AI mid-response. No manual configuration needed - it just works.
Give your agent abilities like database lookups, API calls, or calculations:
from livekit.agents import llm

# Define a tool
@llm.ai_callable()
def get_weather(location: str) -> str:
    """Get the current weather for a location."""
    # Call weather API
    return f"The weather in {location} is sunny, 72°F."

# Add to LLM
llm_instance = openai.LLM(
    model="gpt-4",
    functions=[get_weather]
)
Multiple users can join the same room and talk to the same agent, or multiple agents can be in one room. Perfect for group conversations or agent collaboration.
Send structured data (JSON) from agent to frontend for real-time UI updates:
# Send custom data to room
await ctx.room.local_participant.publish_data(
    json.dumps({"event": "booking_confirmed", "id": "12345"}),
    topic="app_events"
)

Use cases

Customer service bots

Handle support calls with AI agents that can look up orders, process refunds, or escalate to humans.

Interview automation

Conduct initial screening interviews, collect candidate information, and schedule follow-ups.

Voice ordering

Take orders over the phone for restaurants, retail, or services with natural conversation.

Real-time transcription

Live captioning for meetings, lectures, or accessibility features.
Industry-standard for voice AI - Used in production by companies building voice-first applications. Handles all the hard parts of real-time audio.

Twilio

Twilio provides APIs for calling, SMS, WhatsApp, and OTP verification. With a free tier offering 1 phone number and $15 credits, it’s perfect for adding telephony to your hackathon project.

Key features

  • Free tier - 1 free phone number + $15 credits for new accounts
  • Test credentials - Development mode without charges
  • Voice calling - Inbound and outbound phone calls
  • SMS messaging - Send and receive text messages
  • WhatsApp - Integrate WhatsApp messaging
  • OTP verification - Phone number verification codes
  • LiveKit integration - Connect calls to voice agents seamlessly

Pricing

Free tier

  • 1 free phone number
  • $15 trial credits
  • Test credentials for development
  • Enough for entire hackathon

Pay-as-you-go

  • Voice: ~$0.013/min
  • SMS: ~$0.0075/message
  • Phone numbers: $1/month
  • No monthly minimums

Quick start

Install the Twilio SDK:
pip install twilio

Code examples

from twilio.rest import Client

client = Client(account_sid, auth_token)

call = client.calls.create(
    to="+1234567890",
    from_="+0987654321",
    url="http://your-webhook.com/voice",  # TwiML instructions
    method="POST"
)

print(f"Call SID: {call.sid}")
print(f"Status: {call.status}")

Integrate Twilio with LiveKit

Connect phone calls to your LiveKit voice agents:
1

Twilio receives call

User calls your Twilio phone number.
2

Webhook creates LiveKit room

Your server creates a new LiveKit room and generates join token.
3

Connect caller to LiveKit

Use TwiML <Stream> to send audio from Twilio to LiveKit.
4

Agent joins room

Your LiveKit agent joins the same room as the caller.
5

Bidirectional audio

Audio streams between caller and agent in real-time.
from flask import Flask, request
from twilio.twiml.voice_response import VoiceResponse, Connect, Stream
from livekit import api
import os

app = Flask(__name__)

@app.route("/voice", methods=['POST'])
def voice():
    # Create LiveKit room and token
    room_name = f"call-{request.values.get('CallSid')}"
    
    # Generate token for the caller
    token = api.AccessToken(
        os.environ['LIVEKIT_API_KEY'],
        os.environ['LIVEKIT_API_SECRET']
    )
    token.with_identity(request.values.get('From'))
    token.with_grants(api.VideoGrants(
        room_join=True,
        room=room_name
    ))
    
    # Create TwiML response
    response = VoiceResponse()
    response.say("Connecting you to our AI assistant.")
    
    # Stream audio to LiveKit
    connect = Connect()
    stream = Stream(
        url=f"wss://your-livekit-server.com?token={token.to_jwt()}"
    )
    connect.append(stream)
    response.append(connect)
    
    return str(response)

Use cases

Set up a phone number that connects callers to your LiveKit voice agent. Perfect for customer support, order taking, or appointment scheduling.
Send alerts, confirmations, or updates to users via text message. Great for delivery updates, appointment reminders, or emergency alerts.
Verify user phone numbers with OTP codes. Essential for security-critical apps or preventing fake accounts.
Build chatbots or send notifications through WhatsApp Business API. Higher engagement than email for many demographics.
Free credits are enough - $15 credits last the entire hackathon. Test extensively without worrying about costs.

Best practices

Voice agent design

Keep responses concise - Voice UIs are different from text. Aim for 1-2 sentence responses. Users can’t scroll back.
  1. Design for interruption - Users should be able to interrupt the agent naturally
  2. Provide escape hatches - Always offer “say ‘operator’ to speak to a human”
  3. Use conversation markers - “Got it”, “One moment”, “Let me check that”
  4. Avoid long monologues - Break information into chunks with confirmations
  5. Test latency - Aim for under 1 second response time or users get frustrated

Production deployment

1

Monitor latency

Track STT, LLM, and TTS latency separately. Optimize the slowest component.
2

Handle errors gracefully

“Sorry, I didn’t catch that” is better than silence or errors.
3

Log conversations

Store transcripts for debugging and improving prompts.
4

Implement fallbacks

If agent fails, transfer to human or offer callback.
5

Test edge cases

Background noise, accents, interruptions, silence.

Cost optimization

Development

  • Use Twilio test credentials
  • LiveKit self-hosted (free)
  • Free STT: OpenAI Whisper
  • Free LLM: Local models

Demo day

  • Use trial credits
  • Cache common responses
  • Set call time limits
  • Monitor usage in real-time

Latency optimization

Target: Under 1 second end-to-end latency - From user finishing speech to agent starting response.
  • STT: Deepgram or AssemblyAI (fastest options)
  • LLM: GPT-3.5-turbo or Claude Instant (fast, good quality)
  • TTS: ElevenLabs Turbo v2 or OpenAI TTS-1 (lowest latency)
  • Streaming: Enable streaming for all components when possible

Example: Complete voice agent

Put it all together - Twilio phone number connected to LiveKit agent:
# app.py - Flask webhook server
from flask import Flask, request
from twilio.twiml.voice_response import VoiceResponse, Connect, Stream
import os

app = Flask(__name__)

@app.route("/voice", methods=['POST'])
def voice():
    """Handle incoming Twilio calls"""
    call_sid = request.values.get('CallSid')
    caller = request.values.get('From')
    
    response = VoiceResponse()
    response.say(
        "Thank you for calling. Connecting you to our AI assistant.",
        voice="Polly.Joanna"
    )
    
    # Stream audio to LiveKit
    connect = Connect()
    stream = Stream(
        url=f"wss://your-livekit.com?room=call-{call_sid}&caller={caller}"
    )
    connect.append(stream)
    response.append(connect)
    
    return str(response)

if __name__ == "__main__":
    app.run(port=5000)
# agent.py - LiveKit voice agent
import asyncio
from livekit.agents import JobContext, WorkerOptions, cli
from livekit.agents.voice import VoicePipeline
from livekit.plugins import openai, deepgram, elevenlabs

async def entrypoint(ctx: JobContext):
    await ctx.connect()
    
    # Get caller info from room metadata
    caller = ctx.room.metadata.get('caller', 'unknown')
    
    # Create voice pipeline with business logic
    pipeline = VoicePipeline(
        stt=deepgram.STT(language="en-US"),
        llm=openai.LLM(
            model="gpt-4",
            temperature=0.7,
            instructions=f"""
            You are a customer service agent for TechCorp.
            The caller's number is {caller}.
            
            Be friendly, professional, and helpful.
            Keep responses under 2 sentences.
            If asked about pricing, quotes start at $99.
            For technical issues, collect: name, email, and issue description.
            """
        ),
        tts=elevenlabs.TTS(
            voice="Rachel",
            model="eleven_turbo_v2"
        )
    )
    
    pipeline.start(ctx.room)
    await asyncio.sleep(600)  # 10 minute max call time

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))
Deploy this and you have a fully functional AI phone agent in under 100 lines of code!

Build docs developers (and LLMs) love