System Architecture

Overview

The AssemblyAI Real-Time Transcription Browser Example uses a three-tier architecture that separates concerns between the backend server, frontend client, and AssemblyAI’s streaming service.

Architecture Components

1. Express Server (Backend)

The Express server acts as a security layer and token provider:

server.js

const express = require("express");
const path = require("path");
const { generateTempToken } = require("./tokenGenerator");

const app = express();
const PORT = 8000;

app.use(express.static(path.join(__dirname, "public")));

app.get("/token", async (req, res) => {
  try {
    const token = await generateTempToken(60);
    res.json({ token });
  } catch (error) {
    res.status(500).json({ error: "Failed to generate token" });
  }
});

The server’s primary responsibility is generating temporary tokens for secure client-side connections to AssemblyAI. It never exposes your API key to the browser.

2. Browser Client (Frontend)

The client handles three main responsibilities:

Audio capture using the Web Audio API and AudioWorklet
Token retrieval from the Express server
WebSocket communication with AssemblyAI’s real-time service

index.js

async function run() {
  microphone = createMicrophone();
  await microphone.requestPermission();

  // Get temporary token from server
  const response = await fetch("http://localhost:8000/token");
  const data = await response.json();

  // Connect to AssemblyAI with token
  const endpoint = `wss://streaming.assemblyai.com/v3/ws?sample_rate=16000&formatted_finals=true&token=${data.token}`;
  ws = new WebSocket(endpoint);
}

3. AssemblyAI Streaming Service

The AssemblyAI service receives audio data over WebSocket and returns transcripts in real-time using turn-based messages.

Data Flow

Turn-Based Transcription

AssemblyAI returns transcripts as “turns” - natural speech segments organized by speaker turns:

index.js

const turns = {}; // keyed by turn_order

ws.onmessage = (event) => {
  const msg = JSON.parse(event.data);
  if (msg.type === "Turn") {
    const { turn_order, transcript } = msg;
    turns[turn_order] = transcript;

    // Display turns in order
    const orderedTurns = Object.keys(turns)
      .sort((a, b) => Number(a) - Number(b))
      .map((k) => turns[k])
      .join(" ");

    messageEl.innerText = orderedTurns;
  }
};

Turns may arrive out of order due to network conditions or processing delays. The application stores turns in an object and sorts them by turn_order for display.

Security Architecture

By using temporary tokens generated server-side, your AssemblyAI API key never leaves the server. This prevents unauthorized access even if the client code is compromised.

The token-based security model ensures:

API keys remain secret on the server
Clients receive time-limited access tokens
Tokens expire automatically (60 seconds in this example)
Each client session requires a new token

Connection Lifecycle

Initialization: User clicks “Record” button
Token Request: Client fetches temporary token from Express server
WebSocket Connection: Client connects to AssemblyAI using token
Audio Streaming: AudioWorklet processes and sends audio chunks
Transcription: AssemblyAI returns Turn messages with transcripts
Termination: User clicks “Stop”, client sends Terminate message and closes connection

The application maintains state through boolean flags (isRecording) and object references (ws, microphone) to coordinate between components.

Get Started

Core Concepts

Implementation Guide

API Reference

Troubleshooting

Overview

Architecture Components

1. Express Server (Backend)

2. Browser Client (Frontend)

3. AssemblyAI Streaming Service

Data Flow

Turn-Based Transcription

Security Architecture

Connection Lifecycle

Build docs developers (and LLMs) love

Get Started

Core Concepts

Implementation Guide

API Reference

Troubleshooting

​Overview

​Architecture Components

​1. Express Server (Backend)

​2. Browser Client (Frontend)

​3. AssemblyAI Streaming Service

​Data Flow

​Turn-Based Transcription

​Security Architecture

​Connection Lifecycle

Build docs developers (and LLMs) love

Overview

Architecture Components

1. Express Server (Backend)

2. Browser Client (Frontend)

3. AssemblyAI Streaming Service

Data Flow

Turn-Based Transcription

Security Architecture

Connection Lifecycle