Third-party-based Transcription
This method relies on access to a per-speaker audio stream for each participant. Attendee identifies when a participant starts and stops speaking. When a participant pauses for a few seconds, the audio segment is sent to a third-party transcription provider for processing.Latency
The latency of third-party-based transcription is dependent on two main factors: the time it takes for the third-party provider to transcribe the audio, and the size of the audio segment itself. If a participant speaks for a long time without pausing, the audio segment sent for transcription will be large, increasing processing time. These two factors mean that third-party-based transcription generally has higher latency compared to closed caption-based transcription.Quality
Third-party-based transcription is generally of higher quality than closed caption-based transcription.Supported providers
- Deepgram
- OpenAI
- Gladia
- Assembly AI
- Sarvam
- ElevenLabs
- Kyutai Labs
- Custom Async (Bring Your Own Platform)
Cost
Third-party-based transcription incurs costs from the transcription provider. Attendee uses an API key to call the transcription provider that you provide in the Credentials section of the dashboard.Closed Caption-based Transcription
This method takes advantage of the built-in closed captioning feature of the meeting platform. Attendee captures these captions as they are generated by the platform.Latency
This method offers lower latency. Captions are captured as soon as they are generated by the platform.Cost
Closed caption-based transcription is free.Choosing the Right Method
| Feature | Third-party-based Transcription | Closed Caption-based Transcription |
|---|---|---|
| Source | Per-participant audio segments | Built-in captions from the meeting platform (Zoom, Google Meet) |
| Transcription Quality | High (depends on the provider, e.g., OpenAI, Deepgram) | Generally lower than third-party-based transcription |
| Word-level timestamps | Supported by all providers except OpenAI | No. |
| Speaker Diarization | Yes, perfect speaker identification. | Yes, perfect speaker identification. |
| Latency | Higher latency due to provider processing and segment size. | Lower latency, near-instantaneous. |
| Cost | Incurs costs from third-party transcription providers. | No additional costs. |
| Setup | Requires configuring a third-party transcription provider. | No setup required. |
Adding transcription providers in the dashboard
For third-party-based transcription, you need to add your API Key for a provider like Deepgram, OpenAI, Gladia, or Assembly AI in the Settings > Credentials page.Transcription errors
If you are using third-party-based transcription, you may encounter errors from the transcription provider. These errors are visible in the bot detail page in the dashboard, in the transcription section. Additionally, thepost-processing complete bot event will contain a list of transcription errors in the event metadata.
Configuring transcription in the API call
You can configure transcription settings when creating a bot. This includes selecting the transcription provider and provider-specific options like language, model, etc. See the Create Bot API reference for details. You will set the parameters in thetranscription_settings object of the create bot request body. It will have the form:
transcription_settings to:
Setting up webhooks for real time transcription
See the webhooks page for more details on the webhook payload.
Fetching transcripts during and after the meeting
You can fetch transcripts during and after the meeting by calling the/transcript endpoint. See the Get Transcript API reference for details.
Multilingual transcription
All transcription methods can transcribe audio in different languages, but some methods support different languages than others. See the Create Bot API reference for details on how to specify the language. All third-party transcription providers support automatic language detection, but closed caption-based transcription does not. Some third-party providers have the ability to transcribe audio where the speaker is switching languages in the middle of a sentence, see the list below for details.Choosing the right transcription provider
Deepgram
Cheap price, good quality, and fast, the only downside is it doesn’t support as many languages as some of the other providers. Can transcribe audio where the speaker is switching languages in the middle of a sentence. $200 in free credits for new users.Gladia
Similar to Deepgram, but more expensive and supports more languages. Can transcribe audio where the speaker is switching languages in the middle of a sentence. 10 hours of free transcription each month.Assembly AI
Similar to Deepgram in price and quality but lacks the ability to transcribe audio where the speaker is switching languages in the middle of a sentence. Very accurate word-level timestamps. $50 in free credits for new users.OpenAI
Cheaper then the other providers, but less accurate and often chooses the wrong language when the language is not specified in advance. Can transcribe audio where the speaker is switching languages in the middle of a sentence. Lacks word-level timestamps.Note on custom OpenAI proxy serversTo use a custom OpenAI-compatible endpoint (such as a proxy server or alternative model provider), set these environment variables:
OPENAI_BASE_URL: The base URL for your custom endpoint (default:https://api.openai.com/v1)OPENAI_MODEL_NAME: The model name to use for transcription (default:gpt-4o-transcribe)
OPENAI_BASE_URL=https://your-proxy.com/v1 and OPENAI_MODEL_NAME=whisper-large-v3