Skip to main content
Control your AI coding agent with voice using the built-in voice assistant powered by ElevenLabs Conversational AI.

Overview

The voice assistant lets you:

Talk to Your Agent

Ask questions, give instructions, and request code changes hands-free

Approve by Voice

Say “yes” or “no” to approve or deny permission requests

Monitor Progress

Receive spoken updates when tasks complete or errors occur
The assistant bridges voice communication with your active coding agent (Claude Code, Codex, Gemini, or OpenCode), relaying your requests and summarizing responses in natural speech.

Prerequisites

You need an ElevenLabs account with API access.
ElevenLabs offers a free tier with limited usage. Paid plans provide more minutes and better voice quality.

Setup

1. Get an API Key

1

Sign Up

Sign up or log in at elevenlabs.io
2

API Keys

Go to API Keys in your account settings
3

Create Key

Create a new API key and copy it

2. Configure the Hub

Set the environment variable before starting the hub:
export ELEVENLABS_API_KEY="your-api-key"
hapi hub --relay
The hub automatically creates a “Hapi Voice Assistant” agent in your ElevenLabs account on first use.
The --relay flag is optional but recommended for better connectivity.

3. (Optional) Use Custom Agent

If you want to use your own ElevenLabs agent instead of the auto-created one:
export ELEVENLABS_AGENT_ID="your-agent-id"
You can find agent IDs in your ElevenLabs dashboard under Conversational AI agents.

Configuration

VariableRequiredDescription
ELEVENLABS_API_KEYYesElevenLabs API key for voice assistant
ELEVENLABS_AGENT_IDNoCustom agent ID (auto-created if not set)

Usage

Starting a Voice Session

1

Open Session

Open a session in the HAPI web app
2

Microphone Button

Click the microphone button in the composer (or the send button when empty)
3

Grant Permission

Grant microphone permission when prompted by your browser
4

Start Speaking

Start speaking — the assistant is listening
Voice assistant is only available in the web app, not in Telegram Mini App or terminal.

Voice Commands

The assistant understands natural language. Here are common patterns:
Say ThisWhat Happens
”Ask Claude to…” / “Have it…”Sends your request to the coding agent
”Refactor the auth module”Coding requests are forwarded automatically
”Yes” / “Allow” / “Go ahead”Approves pending permission requests
”No” / “Deny” / “Cancel”Denies pending permission requests
Direct questionsThe voice assistant answers itself if it can
You don’t need special command syntax — just speak naturally.

How It Works

Context Synchronization

The voice assistant automatically receives updates when:
  • You focus on a session (full history is loaded)
  • The agent sends messages or uses tools
  • Permission requests arrive
  • Tasks complete
You don’t need to ask for status updates — the assistant proactively summarizes relevant changes.

Tools

The voice assistant has two tools to interact with your coding agent:

messageCodingAgent

Forwards your requests to the active agent

processPermissionRequest

Handles permission approvals and denials

Architecture

Browser → WebRTC → ElevenLabs ConvAI → Voice Assistant → HAPI Hub → Coding Agent
  • Browser captures audio via WebRTC
  • ElevenLabs handles speech-to-text and text-to-speech
  • Voice Assistant interprets intent and calls tools
  • HAPI Hub routes tool calls to the coding agent
  • Coding Agent executes the request
The voice connection uses WebRTC for low-latency audio streaming. The HAPI hub provides conversation tokens and handles authentication.

Use Cases

Hands-Free Coding

Code while your hands are busy:
You: "Ask Claude to refactor the authentication module"
Assistant: "I'll relay that to Claude."
*Claude processes*
Assistant: "Claude has refactored the auth module and is
           requesting permission to edit 3 files."
You: "Yes, allow it."
Assistant: "Permission approved. Claude is writing the changes."

Quick Permission Approvals

Approve permissions while away from keyboard:
Assistant: "Claude is requesting permission to run npm install."
You: "Yes."
Assistant: "Permission approved."

Progress Monitoring

Get spoken updates without looking at screen:
Assistant: "Claude has completed the refactoring.
           All tests are passing. Ready for your next instruction."

Tips for Best Results

Clear, complete requests get better results:
  • ✅ “Refactor the user authentication module to use JWT tokens”
  • ❌ “Fix that thing”
The assistant stays silent while the agent works, then summarizes results. Don’t interrupt while processing.
No special command syntax needed:
  • ✅ “Can you have Claude add error handling to the API?”
  • ✅ “Tell it to fix the bug in utils.ts”
  • ✅ “Yes, go ahead”
Use one active session at a time for clearest context. The assistant tracks the currently focused session.

Audio Quality

For best audio experience:
  • Use a headset to avoid echo
  • Reduce background noise for better recognition
  • Stable internet for low-latency streaming
  • Chrome/Edge recommended (best WebRTC support)

Troubleshooting

Solution: Set ELEVENLABS_API_KEY in your environment and restart the hub:
export ELEVENLABS_API_KEY="your-api-key"
hapi hub
Possible causes:
  • Check browser permissions for microphone access
  • Ensure no other app is using the microphone
  • Try refreshing the page
  • HTTPS required (some browsers block mic on HTTP)
Check these:
  • Verify the session is connected (green dot in status bar)
  • Check that voice status shows “connecting” or connected state
  • Ensure you have a stable internet connection
  • Look for errors in browser console (F12)
Solutions:
  • Verify your API key is valid
  • Check your ElevenLabs account has available quota
  • Try setting a custom ELEVENLABS_AGENT_ID from your dashboard
Improvements:
  • Use a headset to avoid echo
  • Reduce background noise
  • Check your internet connection stability
  • Upgrade to a paid ElevenLabs plan for better voice quality
Tips:
  • Speak clearly and at moderate pace
  • Use complete sentences
  • Be explicit: “Ask Claude to…” vs just describing the task
  • Check that session has an active agent

Limitations

Voice assistant has some limitations:
  • Session focus: Only works with the currently focused session
  • Browser support: Requires WebRTC (Chrome/Edge recommended)
  • Network: Requires stable internet for real-time streaming
  • Cost: Uses ElevenLabs API quota

Browser Support

BrowserPlatformSupport
ChromeDesktop/Android✅ Full support
EdgeDesktop/Android✅ Full support
SafarimacOS/iOS⚠️ Limited WebRTC support
FirefoxAll⚠️ Partial support

Privacy & Security

Audio Processing

  • Audio is streamed to ElevenLabs via WebRTC
  • ElevenLabs processes speech-to-text
  • Transcripts are sent to the voice assistant
  • Voice responses are generated by ElevenLabs

Data Handling

  • Audio is not stored by HAPI
  • Transcripts are logged for debugging (can be disabled)
  • ElevenLabs has its own data retention policies
  • Check ElevenLabs Privacy Policy for details

Remote Control

Control sessions from anywhere

Permissions

Approve agent actions

PWA

Install HAPI on your phone

Build docs developers (and LLMs) love