Overview
The Live API provides WebSocket-based streaming for real-time multimodal conversations with sub-second latency:Real-Time Audio
Stream audio input and receive natural speech responses with native audio processing
Video Streaming
Send video frames for visual understanding in real-time conversations
Low Latency
Sub-second response times for natural, interactive experiences
Function Calling
Integrate tools and APIs during live conversations
Key Features
- Bidirectional streaming: Send and receive data simultaneously
- Native audio processing: PCM audio at 24kHz sampling rate
- Barge-in support: Interrupt model responses naturally
- Multimodal input: Combine text, audio, and video in the same session
- Tool integration: Call functions during conversations
Getting Started
Session Establishment
The Live API follows a strict WebSocket sub-protocol with four phases:1. Handshake
Establish the WebSocket connection with OAuth 2.0 authentication:2. Setup
Configure the session with model parameters:3. Session Loop
Run bidirectional send and receive loops concurrently:4. Termination
Close the WebSocket connection:Message Types
Client Messages
- Text Input
- Audio Streaming
- Video Streaming
- Tool Response
Send text messages to the model:
Server Messages
Handle different types of responses from the server:Complete Example: Text to Speech
A simple text-to-speech example:Function Calling in Live Sessions
Integrate tools and APIs during conversations:Use Cases
Voice Assistants
Build natural voice interfaces for customer support, information retrieval, and task automation
Real-Time Translation
Provide live translation services with audio input and output
Gaming NPCs
Create interactive game characters with natural voice responses
Visual Q&A
Answer questions about live video feeds or camera input
Customer Service
Handle customer inquiries with voice and screen sharing
Education
Interactive tutoring with multimodal explanations
Best Practices
Audio Format Requirements
- Format: PCM16 (Linear 16-bit PCM)
- Sample rate: 24kHz
- Channels: Mono (1 channel)
- Chunk size: ~20ms recommended (480 samples)
Performance Tips
- Use asyncio: Run send and receive loops concurrently for lowest latency
- Buffer management: Keep audio buffers small to minimize delay
- Error handling: Implement reconnection logic for network issues
- Token expiration: Refresh OAuth tokens before they expire (default 60 minutes)
Supported Models
The Live API supports specific Gemini models optimized for real-time interaction:gemini-live-2.5-flash-native-audio: Best for voice interactionsgemini-2.0-flash-exp: Experimental with multimodal support
Next Steps
WebSocket Demo App
Complete reference implementation with React frontend
Native Audio SDK
Higher-level SDK for audio interactions
Function Calling
Learn more about integrating tools
Pricing
View Live API pricing details