Voice Assistant

Control your AI coding agent with voice using the built-in voice assistant powered by ElevenLabs Conversational AI.

Overview

The voice assistant lets you:

Talk to Your Agent

Ask questions, give instructions, and request code changes hands-free

Approve by Voice

Say “yes” or “no” to approve or deny permission requests

Monitor Progress

Receive spoken updates when tasks complete or errors occur

The assistant bridges voice communication with your active coding agent (Claude Code, Codex, Gemini, or OpenCode), relaying your requests and summarizing responses in natural speech.

Prerequisites

You need an ElevenLabs account with API access.

ElevenLabs offers a free tier with limited usage. Paid plans provide more minutes and better voice quality.

Setup

1. Get an API Key

API Keys

Go to API Keys in your account settings

Create Key

Create a new API key and copy it

2. Configure the Hub

Set the environment variable before starting the hub:

export ELEVENLABS_API_KEY="your-api-key"
hapi hub --relay

The hub automatically creates a “Hapi Voice Assistant” agent in your ElevenLabs account on first use.

The --relay flag is optional but recommended for better connectivity.

3. (Optional) Use Custom Agent

If you want to use your own ElevenLabs agent instead of the auto-created one:

export ELEVENLABS_AGENT_ID="your-agent-id"

You can find agent IDs in your ElevenLabs dashboard under Conversational AI agents.

Configuration

Variable	Required	Description
`ELEVENLABS_API_KEY`	Yes	ElevenLabs API key for voice assistant
`ELEVENLABS_AGENT_ID`	No	Custom agent ID (auto-created if not set)

Usage

Starting a Voice Session

Open Session

Open a session in the HAPI web app

Microphone Button

Click the microphone button in the composer (or the send button when empty)

Grant Permission

Grant microphone permission when prompted by your browser

Start Speaking

Start speaking — the assistant is listening

Voice assistant is only available in the web app, not in Telegram Mini App or terminal.

Voice Commands

The assistant understands natural language. Here are common patterns:

Say This	What Happens
”Ask Claude to…” / “Have it…”	Sends your request to the coding agent
”Refactor the auth module”	Coding requests are forwarded automatically
”Yes” / “Allow” / “Go ahead”	Approves pending permission requests
”No” / “Deny” / “Cancel”	Denies pending permission requests
Direct questions	The voice assistant answers itself if it can

You don’t need special command syntax — just speak naturally.

How It Works

Context Synchronization

The voice assistant automatically receives updates when:

You focus on a session (full history is loaded)
The agent sends messages or uses tools
Permission requests arrive
Tasks complete

You don’t need to ask for status updates — the assistant proactively summarizes relevant changes.

Tools

The voice assistant has two tools to interact with your coding agent:

messageCodingAgent

Forwards your requests to the active agent

processPermissionRequest

Handles permission approvals and denials

Architecture

Browser → WebRTC → ElevenLabs ConvAI → Voice Assistant → HAPI Hub → Coding Agent

Browser captures audio via WebRTC
ElevenLabs handles speech-to-text and text-to-speech
Voice Assistant interprets intent and calls tools
HAPI Hub routes tool calls to the coding agent
Coding Agent executes the request

The voice connection uses WebRTC for low-latency audio streaming. The HAPI hub provides conversation tokens and handles authentication.

Use Cases

Hands-Free Coding

Code while your hands are busy:

You: "Ask Claude to refactor the authentication module"
Assistant: "I'll relay that to Claude."
*Claude processes*
Assistant: "Claude has refactored the auth module and is
           requesting permission to edit 3 files."
You: "Yes, allow it."
Assistant: "Permission approved. Claude is writing the changes."

Quick Permission Approvals

Approve permissions while away from keyboard:

Assistant: "Claude is requesting permission to run npm install."
You: "Yes."
Assistant: "Permission approved."

Progress Monitoring

Get spoken updates without looking at screen:

Assistant: "Claude has completed the refactoring.
           All tests are passing. Ready for your next instruction."

Tips for Best Results

Be Specific

Clear, complete requests get better results:

✅ “Refactor the user authentication module to use JWT tokens”
❌ “Fix that thing”

Wait for Completion

The assistant stays silent while the agent works, then summarizes results. Don’t interrupt while processing.

Use Natural Language

No special command syntax needed:

✅ “Can you have Claude add error handling to the API?”
✅ “Tell it to fix the bug in utils.ts”
✅ “Yes, go ahead”

Keep Sessions Focused

Use one active session at a time for clearest context. The assistant tracks the currently focused session.

Audio Quality

For best audio experience:

Use a headset to avoid echo
Reduce background noise for better recognition
Stable internet for low-latency streaming
Chrome/Edge recommended (best WebRTC support)

Troubleshooting

"ElevenLabs API key not configured"

Solution: Set ELEVENLABS_API_KEY in your environment and restart the hub:

export ELEVENLABS_API_KEY="your-api-key"
hapi hub

"Failed to get microphone permission"

Possible causes:

Check browser permissions for microphone access
Ensure no other app is using the microphone
Try refreshing the page
HTTPS required (some browsers block mic on HTTP)

Voice Not Responding

Check these:

Verify the session is connected (green dot in status bar)
Check that voice status shows “connecting” or connected state
Ensure you have a stable internet connection
Look for errors in browser console (F12)

"Failed to create ElevenLabs agent automatically"

Solutions:

Verify your API key is valid
Check your ElevenLabs account has available quota
Try setting a custom ELEVENLABS_AGENT_ID from your dashboard

Poor Audio Quality

Improvements:

Use a headset to avoid echo
Reduce background noise
Check your internet connection stability
Upgrade to a paid ElevenLabs plan for better voice quality

Assistant Misunderstands Commands

Tips:

Speak clearly and at moderate pace
Use complete sentences
Be explicit: “Ask Claude to…” vs just describing the task
Check that session has an active agent

Limitations

Voice assistant has some limitations:

Session focus: Only works with the currently focused session
Browser support: Requires WebRTC (Chrome/Edge recommended)
Network: Requires stable internet for real-time streaming
Cost: Uses ElevenLabs API quota

Browser Support

Browser	Platform	Support
Chrome	Desktop/Android	✅ Full support
Edge	Desktop/Android	✅ Full support
Safari	macOS/iOS	⚠️ Limited WebRTC support
Firefox	All	⚠️ Partial support

Privacy & Security

Audio Processing

Audio is streamed to ElevenLabs via WebRTC
ElevenLabs processes speech-to-text
Transcripts are sent to the voice assistant
Voice responses are generated by ElevenLabs

Data Handling

Audio is not stored by HAPI
Transcripts are logged for debugging (can be disabled)
ElevenLabs has its own data retention policies
Check ElevenLabs Privacy Policy for details

Remote Control

Control sessions from anywhere

Permissions

Approve agent actions

PWA

Install HAPI on your phone

Get Started

Core Concepts

Features

Agents

Deployment

Advanced

Resources

​Overview

Talk to Your Agent

Approve by Voice

Monitor Progress

​Prerequisites

​Setup

​1. Get an API Key

​2. Configure the Hub

​3. (Optional) Use Custom Agent

​Configuration

​Usage

​Starting a Voice Session

​Voice Commands

​How It Works

​Context Synchronization

​Tools

messageCodingAgent

processPermissionRequest

​Architecture

​Use Cases

​Hands-Free Coding

​Quick Permission Approvals

​Progress Monitoring

​Tips for Best Results

​Audio Quality

​Troubleshooting

​Limitations

​Browser Support

​Privacy & Security

​Audio Processing

​Data Handling

​Related Features

Remote Control

Permissions

PWA

Build docs developers (and LLMs) love

Overview

Prerequisites

Setup

1. Get an API Key

2. Configure the Hub

3. (Optional) Use Custom Agent

Configuration

Usage

Starting a Voice Session

Voice Commands

How It Works

Context Synchronization

Tools

Architecture

Use Cases

Hands-Free Coding

Quick Permission Approvals

Progress Monitoring

Tips for Best Results

Audio Quality

Troubleshooting

Limitations

Browser Support

Privacy & Security

Audio Processing

Data Handling

Related Features