Installation

Get started with Voice Agent AI SDK by installing the package and configuring your environment.

Prerequisites

Before installing the SDK, ensure you have:

Node.js 20 or higher - The SDK uses modern JavaScript features
pnpm (recommended) or npm/yarn - Package manager of your choice
OpenAI API key - Required for LLM, transcription, and speech models

While this SDK is optimized for OpenAI models, it works with any AI SDK compatible provider through the Vercel AI SDK.

Install the Package

Install voice-agent-ai-sdk and its peer dependencies:

npm install voice-agent-ai-sdk ai @ai-sdk/openai

Package Details

voice-agent-ai-sdk - The core SDK (version 1.0.1+)
ai - Vercel AI SDK (peer dependency, version 6.0.0+)
@ai-sdk/openai - OpenAI provider for AI SDK

The SDK also includes TypeScript definitions out of the box - no need to install separate @types packages.

Environment Setup

Create a .env file in your project root to store API keys and configuration:

.env

# Required: OpenAI API key for chat, transcription, and speech models
OPENAI_API_KEY=sk-proj-...

# Optional: WebSocket endpoint for voice transport
# Only needed if you're using real-time voice features
VOICE_WS_ENDPOINT=ws://localhost:8080

Security: Never commit your .env file to version control. Add it to .gitignore to keep your API keys secure.

Loading Environment Variables

Load environment variables at the start of your application:

import "dotenv/config";

// Now process.env.OPENAI_API_KEY is available

Alternatively, install dotenv explicitly:

npm install dotenv

Get Your OpenAI API Key

If you don’t have an OpenAI API key yet:

Go to platform.openai.com
Sign in or create an account
Navigate to API Keys in your account settings
Click Create new secret key
Copy the key and add it to your .env file

Make sure you have credits in your OpenAI account. The SDK uses:

Chat models (e.g., gpt-4o) for text generation
whisper-1 for audio transcription
Speech models (e.g., gpt-4o-mini-tts) for text-to-speech

Additional Dependencies (Optional)

Depending on your use case, you may need:

For Tool Calling

pnpm add zod

The SDK uses Zod for type-safe tool input schemas.

For WebSocket Server

pnpm add ws @types/ws

Needed if you’re building a WebSocket server to handle voice connections.

Verify Installation

Create a simple test file to verify everything is set up correctly:

test.ts

import "dotenv/config";
import { VoiceAgent } from "voice-agent-ai-sdk";
import { openai } from "@ai-sdk/openai";

const agent = new VoiceAgent({
  model: openai("gpt-4o"),
  instructions: "You are a helpful assistant.",
});

agent.on("text", ({ role, text }) => {
  console.log(`${role}: ${text}`);
});

await agent.sendText("Hello!");

console.log("✓ SDK is working!");

Run it:

node --loader tsx test.ts

If you see the assistant’s response, you’re all set!

Next Steps

Quickstart Guide

Build your first voice agent with streaming text and speech

Get Started

Core Concepts

Guides

Examples

Prerequisites

Install the Package

Package Details

Environment Setup

Loading Environment Variables

Get Your OpenAI API Key

Additional Dependencies (Optional)

For Tool Calling

For WebSocket Server

Verify Installation

Next Steps

Quickstart Guide

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Examples

​Prerequisites

​Install the Package

​Package Details

​Environment Setup

​Loading Environment Variables

​Get Your OpenAI API Key

​Additional Dependencies (Optional)

​For Tool Calling

​For WebSocket Server

​Verify Installation

​Next Steps

Quickstart Guide

Build docs developers (and LLMs) love

Prerequisites

Install the Package

Package Details

Environment Setup

Loading Environment Variables

Get Your OpenAI API Key

Additional Dependencies (Optional)

For Tool Calling

For WebSocket Server

Verify Installation

Next Steps