Skip to main content
Before running this app, you need to upgrade your AssemblyAI account. The real-time API is only available to upgraded accounts at this time. Running the app before upgrading will cause an error with a 402 status code.To upgrade your account, add a card in your AssemblyAI dashboard.

Getting Started

This guide will walk you through setting up and running the AssemblyAI Real-Time Transcription Browser Example. This application captures audio from your microphone and sends it over a WebSocket to AssemblyAI for real-time transcription using Express and the AudioWorklet API.
1

Clone the repository

Clone the repository to your local machine:
git clone https://github.com/AssemblyAI/realtime-transcription-browser-js-example.git
cd realtime-transcription-browser-js-example
2

Install dependencies

Install the required dependencies using your preferred package manager:
npm install
The project requires the following dependencies:
  • express - Web server framework
  • axios - HTTP client for API requests
  • dotenv - Environment variable management
3

Configure your API key

Create a .env file in the root directory and add your AssemblyAI API key:
ASSEMBLYAI_API_KEY=YOUR_API_KEY
You can find your API key in your AssemblyAI dashboard. The API key is used to generate temporary tokens for secure WebSocket connections.
4

Start the server

Start the Express server:
npm run serve
The server will start on port 8000. You should see:
Server is running at http://localhost:8000
Open your browser and navigate to http://localhost:8000.
5

Start recording

Click the Start button on the page to begin recording. Here’s what happens:
  1. Microphone permission request: The browser will ask for permission to access your microphone
  2. Token generation: The client fetches a temporary token from the server:
    const response = await fetch("http://localhost:8000/token");
    const data = await response.json();
    
  3. WebSocket connection: A WebSocket connection is established with AssemblyAI:
    const endpoint = `wss://streaming.assemblyai.com/v3/ws?sample_rate=16000&formatted_finals=true&token=${data.token}`;
    ws = new WebSocket(endpoint);
    
  4. Audio streaming: Audio is captured using the AudioWorklet API at 16kHz sample rate and sent to AssemblyAI in real-time:
    microphone.startRecording((audioChunk) => {
      if (ws.readyState === WebSocket.OPEN) {
        ws.send(audioChunk);
      }
    });
    
  5. Transcription display: As AssemblyAI transcribes your speech, the transcription appears on the page:
    ws.onmessage = (event) => {
      const msg = JSON.parse(event.data);
      if (msg.type === "Turn") {
        const { turn_order, transcript } = msg;
        turns[turn_order] = transcript;
        // Display ordered transcripts
      }
    };
    
Click Stop to end the recording and close the WebSocket connection.

How It Works

Token Generation

The server generates temporary tokens for secure authentication with AssemblyAI’s real-time API:
server.js
app.get("/token", async (req, res) => {
  try {
    const token = await generateTempToken(60); // Max value 600
    res.json({ token });
  } catch (error) {
    res.status(500).json({ error: "Failed to generate token" });
  }
});
The generateTempToken function calls AssemblyAI’s token API:
tokenGenerator.js
async function generateTempToken(expiresInSeconds) {
  const url = `https://streaming.assemblyai.com/v3/token?expires_in_seconds=${expiresInSeconds}`;

  const response = await axios.get(url, {
    headers: {
      Authorization: process.env.ASSEMBLYAI_API_KEY,
    },
  });
  return response.data.token;
}

Audio Processing

The application uses the Web Audio API’s AudioWorklet to process audio at the correct sample rate (16kHz) before sending it to AssemblyAI. Audio chunks are buffered and sent every 100ms to balance latency and network efficiency.
The AudioWorklet runs audio processing in a separate thread, preventing audio glitches and ensuring smooth recording even when the main thread is busy.

Next Steps

Now that you have the basic example running, you can:
  • Explore the source code to understand the WebSocket message handling
  • Customize the UI and add additional features
  • Learn more about AssemblyAI’s Real-Time API
  • Experiment with different audio configurations and sample rates

Build docs developers (and LLMs) love