Voice AI and communications

Create production-ready voice AI applications with complete STT-LLM-TTS pipelines and calling integrations. These tools handle the hard parts of real-time audio so you can focus on the experience.

LiveKit

Open-source real-time voice AI framework

Twilio

Calling, SMS, and OTP integration

LiveKit

LiveKit is an open-source WebRTC framework for building real-time voice AI applications. It provides a complete STT-LLM-TTS pipeline with turn detection and interruption handling - the industry standard for voice-first applications.

Key features

Complete voice pipeline - STT → LLM → TTS with seamless integration
WebRTC-based streaming - Real-time, low-latency audio/video
Turn detection - Automatically detects when users stop speaking
Interruption handling - Users can interrupt the AI mid-response
Multi-participant rooms - Multiple users can join the same session
Provider plugins - OpenAI, AssemblyAI, Deepgram, ElevenLabs, and more
Python & JavaScript SDKs - Build agents in your preferred language
Self-hosted or cloud - Deploy anywhere

Architecture overview

LiveKit uses a room-based architecture where both users and AI agents join as participants:

┌─────────────┐         ┌──────────────┐
│   Browser   │◄───────►│  LiveKit     │
│   (User)    │  WebRTC │  Room        │
└─────────────┘         └──────┬───────┘
                               │
                        ┌──────▼───────┐
                        │  Python      │
                        │  Agent       │
                        │  (STT→LLM    │
                        │   →TTS)      │
                        └──────────────┘

User joins LiveKit room

Browser or mobile app connects to LiveKit room via WebRTC.

Python agent joins same room

Agent joins as a participant with audio capabilities.

Audio streams bidirectionally

User speech streams to agent, agent responses stream back.

Agent processes in real-time

STT converts speech to text, LLM generates response, TTS converts to audio.

Events sync UI instantly

Structured events (transcripts, responses) update the interface in real-time.

Provider plugins

LiveKit supports plugins for major AI providers:

Speech-to-Text
Language Models
Text-to-Speech

Deepgram - Fast, accurate transcription
AssemblyAI - High accuracy with punctuation
OpenAI Whisper - Open-source option
Google Cloud Speech - 120+ languages
Azure Speech - Enterprise support

Quick start

Install the LiveKit agents SDK:

pip install livekit livekit-agents livekit-plugins-openai livekit-plugins-deepgram livekit-plugins-elevenlabs

Basic agent example

import asyncio
from livekit import rtc
from livekit.agents import JobContext, WorkerOptions, cli
from livekit.agents.voice import VoicePipeline
from livekit.plugins import openai, deepgram, elevenlabs

async def entrypoint(ctx: JobContext):
    # Connect to the room
    await ctx.connect()
    
    # Create voice pipeline
    pipeline = VoicePipeline(
        stt=deepgram.STT(),
        llm=openai.LLM(model="gpt-4"),
        tts=elevenlabs.TTS(voice="Rachel"),
    )
    
    # Start the pipeline
    pipeline.start(ctx.room)
    
    # Keep the agent alive
    await asyncio.sleep(3600)

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

Advanced features

Turn detection and interruption

LiveKit automatically detects when users stop speaking (turn detection) and allows users to interrupt the AI mid-response. No manual configuration needed - it just works.

Function calling / Tools

Give your agent abilities like database lookups, API calls, or calculations:

from livekit.agents import llm

# Define a tool
@llm.ai_callable()
def get_weather(location: str) -> str:
    """Get the current weather for a location."""
    # Call weather API
    return f"The weather in {location} is sunny, 72°F."

# Add to LLM
llm_instance = openai.LLM(
    model="gpt-4",
    functions=[get_weather]
)

Multi-participant rooms

Multiple users can join the same room and talk to the same agent, or multiple agents can be in one room. Perfect for group conversations or agent collaboration.

Structured data sync

Send structured data (JSON) from agent to frontend for real-time UI updates:

# Send custom data to room
await ctx.room.local_participant.publish_data(
    json.dumps({"event": "booking_confirmed", "id": "12345"}),
    topic="app_events"
)

Use cases

Customer service bots

Handle support calls with AI agents that can look up orders, process refunds, or escalate to humans.

Interview automation

Conduct initial screening interviews, collect candidate information, and schedule follow-ups.

Voice ordering

Take orders over the phone for restaurants, retail, or services with natural conversation.

Real-time transcription

Live captioning for meetings, lectures, or accessibility features.

Industry-standard for voice AI - Used in production by companies building voice-first applications. Handles all the hard parts of real-time audio.

Twilio

Twilio provides APIs for calling, SMS, WhatsApp, and OTP verification. With a free tier offering 1 phone number and $15 credits, it’s perfect for adding telephony to your hackathon project.

Key features

Free tier - 1 free phone number + $15 credits for new accounts
Test credentials - Development mode without charges
Voice calling - Inbound and outbound phone calls
SMS messaging - Send and receive text messages
WhatsApp - Integrate WhatsApp messaging
OTP verification - Phone number verification codes
LiveKit integration - Connect calls to voice agents seamlessly

Pricing

Free tier

1 free phone number
$15 trial credits
Test credentials for development
Enough for entire hackathon

Pay-as-you-go

Voice: ~$0.013/min
SMS: ~$0.0075/message
Phone numbers: $1/month
No monthly minimums

Quick start

Install the Twilio SDK:

pip install twilio

Code examples

from twilio.rest import Client

client = Client(account_sid, auth_token)

call = client.calls.create(
    to="+1234567890",
    from_="+0987654321",
    url="http://your-webhook.com/voice",  # TwiML instructions
    method="POST"
)

print(f"Call SID: {call.sid}")
print(f"Status: {call.status}")

Integrate Twilio with LiveKit

Connect phone calls to your LiveKit voice agents:

Twilio receives call

User calls your Twilio phone number.

Webhook creates LiveKit room

Your server creates a new LiveKit room and generates join token.

Connect caller to LiveKit

Use TwiML <Stream> to send audio from Twilio to LiveKit.

Agent joins room

Your LiveKit agent joins the same room as the caller.

Bidirectional audio

Audio streams between caller and agent in real-time.

from flask import Flask, request
from twilio.twiml.voice_response import VoiceResponse, Connect, Stream
from livekit import api
import os

app = Flask(__name__)

@app.route("/voice", methods=['POST'])
def voice():
    # Create LiveKit room and token
    room_name = f"call-{request.values.get('CallSid')}"
    
    # Generate token for the caller
    token = api.AccessToken(
        os.environ['LIVEKIT_API_KEY'],
        os.environ['LIVEKIT_API_SECRET']
    )
    token.with_identity(request.values.get('From'))
    token.with_grants(api.VideoGrants(
        room_join=True,
        room=room_name
    ))
    
    # Create TwiML response
    response = VoiceResponse()
    response.say("Connecting you to our AI assistant.")
    
    # Stream audio to LiveKit
    connect = Connect()
    stream = Stream(
        url=f"wss://your-livekit-server.com?token={token.to_jwt()}"
    )
    connect.append(stream)
    response.append(connect)
    
    return str(response)

Use cases

Voice agent hotline

Set up a phone number that connects callers to your LiveKit voice agent. Perfect for customer support, order taking, or appointment scheduling.

SMS notifications

Send alerts, confirmations, or updates to users via text message. Great for delivery updates, appointment reminders, or emergency alerts.

Phone verification

Verify user phone numbers with OTP codes. Essential for security-critical apps or preventing fake accounts.

WhatsApp integration

Build chatbots or send notifications through WhatsApp Business API. Higher engagement than email for many demographics.

Free credits are enough - $15 credits last the entire hackathon. Test extensively without worrying about costs.

Best practices

Voice agent design

Keep responses concise - Voice UIs are different from text. Aim for 1-2 sentence responses. Users can’t scroll back.

Design for interruption - Users should be able to interrupt the agent naturally
Provide escape hatches - Always offer “say ‘operator’ to speak to a human”
Use conversation markers - “Got it”, “One moment”, “Let me check that”
Avoid long monologues - Break information into chunks with confirmations
Test latency - Aim for under 1 second response time or users get frustrated

Production deployment

Monitor latency

Track STT, LLM, and TTS latency separately. Optimize the slowest component.

Handle errors gracefully

“Sorry, I didn’t catch that” is better than silence or errors.

Log conversations

Store transcripts for debugging and improving prompts.

Implement fallbacks

If agent fails, transfer to human or offer callback.

Test edge cases

Background noise, accents, interruptions, silence.

Cost optimization

Development

Use Twilio test credentials
LiveKit self-hosted (free)
Free STT: OpenAI Whisper
Free LLM: Local models

Demo day

Use trial credits
Cache common responses
Set call time limits
Monitor usage in real-time

Latency optimization

Target: Under 1 second end-to-end latency - From user finishing speech to agent starting response.

STT: Deepgram or AssemblyAI (fastest options)
LLM: GPT-3.5-turbo or Claude Instant (fast, good quality)
TTS: ElevenLabs Turbo v2 or OpenAI TTS-1 (lowest latency)
Streaming: Enable streaming for all components when possible

Example: Complete voice agent

Put it all together - Twilio phone number connected to LiveKit agent:

# app.py - Flask webhook server
from flask import Flask, request
from twilio.twiml.voice_response import VoiceResponse, Connect, Stream
import os

app = Flask(__name__)

@app.route("/voice", methods=['POST'])
def voice():
    """Handle incoming Twilio calls"""
    call_sid = request.values.get('CallSid')
    caller = request.values.get('From')
    
    response = VoiceResponse()
    response.say(
        "Thank you for calling. Connecting you to our AI assistant.",
        voice="Polly.Joanna"
    )
    
    # Stream audio to LiveKit
    connect = Connect()
    stream = Stream(
        url=f"wss://your-livekit.com?room=call-{call_sid}&caller={caller}"
    )
    connect.append(stream)
    response.append(connect)
    
    return str(response)

if __name__ == "__main__":
    app.run(port=5000)

# agent.py - LiveKit voice agent
import asyncio
from livekit.agents import JobContext, WorkerOptions, cli
from livekit.agents.voice import VoicePipeline
from livekit.plugins import openai, deepgram, elevenlabs

async def entrypoint(ctx: JobContext):
    await ctx.connect()
    
    # Get caller info from room metadata
    caller = ctx.room.metadata.get('caller', 'unknown')
    
    # Create voice pipeline with business logic
    pipeline = VoicePipeline(
        stt=deepgram.STT(language="en-US"),
        llm=openai.LLM(
            model="gpt-4",
            temperature=0.7,
            instructions=f"""
            You are a customer service agent for TechCorp.
            The caller's number is {caller}.
            
            Be friendly, professional, and helpful.
            Keep responses under 2 sentences.
            If asked about pricing, quotes start at $99.
            For technical issues, collect: name, email, and issue description.
            """
        ),
        tts=elevenlabs.TTS(
            voice="Rachel",
            model="eleven_turbo_v2"
        )
    )
    
    pipeline.start(ctx.room)
    await asyncio.sleep(600)  # 10 minute max call time

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

Deploy this and you have a fully functional AI phone agent in under 100 lines of code!

Getting Started

Resources

Examples

LiveKit

Twilio

LiveKit

Key features

Architecture overview

Provider plugins

Quick start

Basic agent example

Advanced features

Use cases

Customer service bots

Interview automation

Voice ordering

Real-time transcription

Twilio

Key features

Pricing

Free tier

Pay-as-you-go

Quick start

Code examples

Integrate Twilio with LiveKit

Use cases

Best practices

Voice agent design

Production deployment

Cost optimization

Development

Demo day

Latency optimization

Example: Complete voice agent

Build docs developers (and LLMs) love

Getting Started

Resources

Examples

LiveKit

Twilio

​LiveKit

​Key features

​Architecture overview

​Provider plugins

​Quick start

​Basic agent example

​Advanced features

​Use cases

Customer service bots

Interview automation

Voice ordering

Real-time transcription

​Twilio

​Key features

​Pricing

Free tier

Pay-as-you-go

​Quick start

​Code examples

​Integrate Twilio with LiveKit

​Use cases

​Best practices

​Voice agent design

​Production deployment

​Cost optimization

Development

Demo day

​Latency optimization

​Example: Complete voice agent

Build docs developers (and LLMs) love

LiveKit

Key features

Architecture overview

Provider plugins

Quick start

Basic agent example

Advanced features

Use cases

Twilio

Key features

Pricing

Quick start

Code examples

Integrate Twilio with LiveKit

Use cases

Best practices

Voice agent design

Production deployment

Cost optimization

Latency optimization

Example: Complete voice agent