Realtime Audio Input and Output

Attendee supports bidirectional realtime audio streaming through websockets. You can receive audio from meetings and have your bot output audio into meetings in real-time.

Setup

To enable realtime audio streaming, configure the websocket_settings.audio parameter when creating a bot:

{
  "meeting_url": "https://meet.google.com/abc-def-ghi",
  "bot_name": "Audio Bot",
  "websocket_settings": {
    "audio": {
      "url": "wss://your-server.com/attendee-websocket",
      "sample_rate": 16000
    }
  }
}

The sample_rate can be 8000, 16000, or 24000 and defaults to 16000. It determines the sample rate of the audio chunks you receive from Attendee.

Websocket Message Format

Outgoing Audio (Attendee → Your Websocket Server)

Your WebSocket server will receive messages in this format.

{
  "bot_id": "bot_12345abcdef",
  "trigger": "realtime_audio.mixed",
  "data": {
    "chunk": "UklGRiQAAABXQVZFZm10IBAAAAABAAEAgD4AAAB9AAABACAAAGRLVEE...",
    "sample_rate": 16000,
    "timestamp_ms": 1703123456789
  }
}

The chunk field is base64-encoded 16-bit single channel PCM audio data at the frequency specified in the sample_rate field.

Incoming Audio (Your Websocket Server → Attendee)

When you want the bot to speak audio in the meeting, send a message in this format.

{
  "trigger": "realtime_audio.bot_output",
  "data": {
    "chunk": "UklGRiQAAABXQVZFZm10IBAAAAABAAEAgD4AAAB9AAABACAAAGRLVEE...",
    "sample_rate": 16000
  }
}

The chunk field is base64-encoded 16-bit single-channel PCM audio data. The sample rate can be 8000, 16000 or 24000.

Integration with Voice Agent APIs

The realtime audio streaming can be easily integrated with voice agent APIs to bring voice agents into meetings.

Deepgram Voice Agent API
OpenAI Realtime API

Connect directly to Deepgram’s voice agent WebSocket API by forwarding audio chunks. Set an output sample rate of 16000 to be compatible with Deepgram’s real-time streaming requirements.See an example app showing how to integrate with Deepgram’s voice agent API here.

Connect directly to OpenAI’s realtime API by forwarding audio chunks. Set an output sample rate of 24000 to be compatible with OpenAI’s real-time streaming requirements.

Code Samples

A simple example app showing how to integrate with Deepgram’s voice agent API: https://github.com/attendee-labs/voice-agent-example

Retries on Websocket Connections

Attendee will automatically retry to connect to your websocket server if the connection is lost or the initial connection attempt fails. We will retry up to 30 times with a 2 second delay between retries.

Error Messages

Currently, we don’t give any feedback on errors with the websocket connection or invalid message formats. We plan to improve this in the future.

Getting Started

Core Concepts

Guides

Platform Support

Self-Hosting

Realtime Audio Input and Output

Setup

Websocket Message Format

Outgoing Audio (Attendee → Your Websocket Server)

Incoming Audio (Your Websocket Server → Attendee)

Integration with Voice Agent APIs

Code Samples

Retries on Websocket Connections

Error Messages

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Guides

Platform Support

Self-Hosting

​Setup

​Websocket Message Format

​Outgoing Audio (Attendee → Your Websocket Server)

​Incoming Audio (Your Websocket Server → Attendee)

​Integration with Voice Agent APIs

​Code Samples

​Retries on Websocket Connections

​Error Messages

Build docs developers (and LLMs) love

Setup

Websocket Message Format

Outgoing Audio (Attendee → Your Websocket Server)

Incoming Audio (Your Websocket Server → Attendee)

Integration with Voice Agent APIs

Code Samples

Retries on Websocket Connections

Error Messages