Skip to main content
Attendee offers two methods for real-time meeting transcription: Third-party-based Transcription and Closed Caption-based Transcription. Both methods allow you to receive real-time updates via webhooks and have perfect speaker identification (diarization).
For an example of a simple web application that uses Attendee to transcribe meeting audio in real time, see the real time transcription example repository.

Third-party-based Transcription

This method relies on access to a per-speaker audio stream for each participant. Attendee identifies when a participant starts and stops speaking. When a participant pauses for a few seconds, the audio segment is sent to a third-party transcription provider for processing.

Latency

The latency of third-party-based transcription is dependent on two main factors: the time it takes for the third-party provider to transcribe the audio, and the size of the audio segment itself. If a participant speaks for a long time without pausing, the audio segment sent for transcription will be large, increasing processing time. These two factors mean that third-party-based transcription generally has higher latency compared to closed caption-based transcription.

Quality

Third-party-based transcription is generally of higher quality than closed caption-based transcription.

Supported providers

  • Deepgram
  • OpenAI
  • Gladia
  • Assembly AI
  • Sarvam
  • ElevenLabs
  • Kyutai Labs
  • Custom Async (Bring Your Own Platform)
See the Create Bot API reference for supported parameters for configuring the transcription providers.

Cost

Third-party-based transcription incurs costs from the transcription provider. Attendee uses an API key to call the transcription provider that you provide in the Credentials section of the dashboard.

Closed Caption-based Transcription

This method takes advantage of the built-in closed captioning feature of the meeting platform. Attendee captures these captions as they are generated by the platform.

Latency

This method offers lower latency. Captions are captured as soon as they are generated by the platform.

Cost

Closed caption-based transcription is free.

Choosing the Right Method

FeatureThird-party-based TranscriptionClosed Caption-based Transcription
SourcePer-participant audio segmentsBuilt-in captions from the meeting platform (Zoom, Google Meet)
Transcription QualityHigh (depends on the provider, e.g., OpenAI, Deepgram)Generally lower than third-party-based transcription
Word-level timestampsSupported by all providers except OpenAINo.
Speaker DiarizationYes, perfect speaker identification.Yes, perfect speaker identification.
LatencyHigher latency due to provider processing and segment size.Lower latency, near-instantaneous.
CostIncurs costs from third-party transcription providers.No additional costs.
SetupRequires configuring a third-party transcription provider.No setup required.

Adding transcription providers in the dashboard

For third-party-based transcription, you need to add your API Key for a provider like Deepgram, OpenAI, Gladia, or Assembly AI in the Settings > Credentials page.

Transcription errors

If you are using third-party-based transcription, you may encounter errors from the transcription provider. These errors are visible in the bot detail page in the dashboard, in the transcription section. Additionally, the post-processing complete bot event will contain a list of transcription errors in the event metadata.

Configuring transcription in the API call

You can configure transcription settings when creating a bot. This includes selecting the transcription provider and provider-specific options like language, model, etc. See the Create Bot API reference for details. You will set the parameters in the transcription_settings object of the create bot request body. It will have the form:
{
    "chosen transcription provider": {
        "provider-specific parameters"
    }
}
For example, if you want to use Deepgram with english and the nova-2 model, you will set the transcription_settings to:
{
    "deepgram": {
        "language": "en-US",
        "model": "nova-2"
    }
}

Setting up webhooks for real time transcription

1

Navigate to webhooks

Go to the Settings > Webhooks page and click the ‘Create Webhook’ button.
2

Enable the transcript.update trigger

Make sure the transcript.update trigger is enabled for your webhook. This will fire a webhook event every time a new utterance is added to the transcript.
See the webhooks page for more details on the webhook payload.

Fetching transcripts during and after the meeting

You can fetch transcripts during and after the meeting by calling the /transcript endpoint. See the Get Transcript API reference for details.

Multilingual transcription

All transcription methods can transcribe audio in different languages, but some methods support different languages than others. See the Create Bot API reference for details on how to specify the language. All third-party transcription providers support automatic language detection, but closed caption-based transcription does not. Some third-party providers have the ability to transcribe audio where the speaker is switching languages in the middle of a sentence, see the list below for details.

Choosing the right transcription provider

Deepgram

Cheap price, good quality, and fast, the only downside is it doesn’t support as many languages as some of the other providers. Can transcribe audio where the speaker is switching languages in the middle of a sentence. $200 in free credits for new users.

Gladia

Similar to Deepgram, but more expensive and supports more languages. Can transcribe audio where the speaker is switching languages in the middle of a sentence. 10 hours of free transcription each month.

Assembly AI

Similar to Deepgram in price and quality but lacks the ability to transcribe audio where the speaker is switching languages in the middle of a sentence. Very accurate word-level timestamps. $50 in free credits for new users.

OpenAI

Cheaper then the other providers, but less accurate and often chooses the wrong language when the language is not specified in advance. Can transcribe audio where the speaker is switching languages in the middle of a sentence. Lacks word-level timestamps.
Note on custom OpenAI proxy serversTo use a custom OpenAI-compatible endpoint (such as a proxy server or alternative model provider), set these environment variables:
  • OPENAI_BASE_URL: The base URL for your custom endpoint (default: https://api.openai.com/v1)
  • OPENAI_MODEL_NAME: The model name to use for transcription (default: gpt-4o-transcribe)
Example: OPENAI_BASE_URL=https://your-proxy.com/v1 and OPENAI_MODEL_NAME=whisper-large-v3

Custom Async (Bring Your Own Platform)

For Attendee self-hosters only. Lets you use your own self-hosted transcription service. See the Custom Async Transcription page for more details.

Build docs developers (and LLMs) love