Voice Integration

Overview

Echoes of the Past uses Vapi for real-time voice conversations with historical figures. The integration includes:

Voice synthesis via ElevenLabs
Speech recognition for user input
Real-time transcription and message streaming
Audio level monitoring for visual feedback
Assistant configuration with OpenAI models

Vapi SDK Setup

The Vapi client is initialized in lib/vapi.ts:

lib/vapi.ts

import Vapi from '@vapi-ai/web'

if (!process.env.NEXT_PUBLIC_VAPI_WEB_TOKEN) {
  throw new Error('NEXT_PUBLIC_VAPI_WEB_TOKEN environment variable is required')
}

export const vapi = new Vapi(process.env.NEXT_PUBLIC_VAPI_WEB_TOKEN)

Environment setup:

.env.local

NEXT_PUBLIC_VAPI_WEB_TOKEN=your_vapi_token_here
NEXT_PUBLIC_SERVER_URL=https://your-domain.com/api/webhook

useVapi Hook

The useVapi hook encapsulates all Vapi integration logic:

Hook Interface

features/call/hooks/useVapi.ts

import { useEffect, useState } from 'react'
import { HistoricalFigure } from '@/types'
import { vapi } from '@/lib/vapi'
import { Message, TranscriptMessage } from '@/types/conversation.type'

type UseVapiProps = {
  character: HistoricalFigure
  systemPrompt: string
  firstMessage: string
}

export function useVapi({ character, systemPrompt, firstMessage }: UseVapiProps) {
  const [isSpeechActive, setIsSpeechActive] = useState(false)
  const [callStatus, setCallStatus] = useState<CALL_STATUS>(CALL_STATUS.INACTIVE)
  const [messages, setMessages] = useState<Message[]>([])
  const [activeTranscript, setActiveTranscript] = useState<TranscriptMessage | null>(null)
  const [audioLevel, setAudioLevel] = useState(0)

  // ... implementation

  return {
    isSpeechActive,
    callStatus,
    audioLevel,
    activeTranscript,
    messages,
    start,
    stop,
    toggleCall,
    askQuestion,
  }
}

Assistant Configuration

The assistant object defines the AI’s behavior, voice, and model:

features/call/hooks/useVapi.ts

const assistant: Omit<CreateAssistantDTO, 'clientMessages' | 'serverMessages'> = {
  name: character.name,
  firstMessage,
  
  // AI Model Configuration
  model: {
    provider: 'openai',
    model: 'gpt-3.5-turbo',
    temperature: 0.7,
    messages: [
      {
        role: 'system',
        content: systemPrompt,
      },
    ],
  },
  
  // Idle Message Handling
  messagePlan: {
    idleMessages: [
      'If you have a question, feel free to ask',
      'Are you there?',
      'What are you thinking? I can help you!!',
      "I'm here whenever you're ready to continue"
    ],
    idleTimeoutSeconds: 15,
    idleMessageMaxSpokenCount: 3,
    idleMessageResetCountOnUserSpeechEnabled: true,
  },
  
  // Audio Processing
  backgroundDenoisingEnabled: true,
  
  // Voice Configuration (ElevenLabs)
  voice: {
    provider: '11labs',
    voiceId: character.voiceId,
    stability: 0.4,
    similarityBoost: 0.8,
    speed: 1,
    style: 0.5,
    useSpeakerBoost: true,
  },
  
  // Server Webhook
  server: {
    url: process.env.NEXT_PUBLIC_SERVER_URL || 
      'https://your-backend.ngrok-free.app/api/webhook',
  },
}

Voice Configuration Parameters

ElevenLabs voice settings:

voiceId: Character-specific voice from ElevenLabs
stability: 0-1, higher = more consistent but less expressive (0.4 recommended)
similarityBoost: 0-1, higher = closer to original voice (0.8 recommended)
speed: Playback speed multiplier (1 = normal)
style: 0-1, voice style intensity (0.5 = balanced)
useSpeakerBoost: Enhances clarity for different speakers

Tuning tips:

Lower stability (0.3-0.5) for more dynamic conversations
Higher stability (0.6-0.8) for formal or educational content
Adjust speed based on character personality (slower for contemplative figures)

Message Plan Configuration

Idle message behavior:

messagePlan: {
  idleMessages: ['Are you there?', 'Feel free to ask'],
  idleTimeoutSeconds: 15,          // Wait 15s before idle message
  idleMessageMaxSpokenCount: 3,    // Stop after 3 idle messages
  idleMessageResetCountOnUserSpeechEnabled: true,
}

Purpose: Keeps conversation engaging without being annoying. After user speaks, idle counter resets.

Event Handling

Vapi emits events throughout the call lifecycle:

Event Listeners Setup

features/call/hooks/useVapi.ts

useEffect(() => {
  // Speech Events
  const onSpeechStart = () => setIsSpeechActive(true)
  const onSpeechEnd = () => {
    console.log('Speech has ended')
    setIsSpeechActive(false)
  }

  // Call Lifecycle
  const onCallStartHandler = () => {
    console.log('Call has started')
    setCallStatus(CALL_STATUS.ACTIVE)
  }

  const onCallEnd = () => {
    console.log('Call has stopped')
    setCallStatus(CALL_STATUS.INACTIVE)
  }

  // Audio Level for Visualization
  const onVolumeLevel = (volume: number) => {
    setAudioLevel(volume)
  }

  // Message Updates (Transcription)
  const onMessageUpdate = (message: Message) => {
    if (message.type === MessageTypeEnum.TRANSCRIPT && 
        message.transcriptType === TranscriptMessageTypeEnum.PARTIAL) {
      // Live transcription (user still speaking)
      setActiveTranscript(message)
    } else {
      // Final message (completed)
      setMessages((prev) => [...prev, message])
      setActiveTranscript(null)
    }
  }

  // Error Handling
  const onError = (e: Error) => {
    setCallStatus(CALL_STATUS.INACTIVE)
    console.error(e)
  }

  // Register all listeners
  vapi.on('speech-start', onSpeechStart)
  vapi.on('speech-end', onSpeechEnd)
  vapi.on('call-start', onCallStartHandler)
  vapi.on('call-end', onCallEnd)
  vapi.on('volume-level', onVolumeLevel)
  vapi.on('message', onMessageUpdate)
  vapi.on('error', onError)

  // Cleanup on unmount
  return () => {
    vapi.off('speech-start', onSpeechStart)
    vapi.off('speech-end', onSpeechEnd)
    vapi.off('call-start', onCallStartHandler)
    vapi.off('call-end', onCallEnd)
    vapi.off('volume-level', onVolumeLevel)
    vapi.off('message', onMessageUpdate)
    vapi.off('error', onError)
  }
}, [])

Event Types Reference

speech-start / speech-end
call-start / call-end
volume-level
message

Purpose: Track when AI assistant is speakingUse cases:

Show “speaking” indicator
Animate avatar
Disable input during AI speech

const [isSpeechActive, setIsSpeechActive] = useState(false)

vapi.on('speech-start', () => setIsSpeechActive(true))
vapi.on('speech-end', () => setIsSpeechActive(false))

// In component
<div className={isSpeechActive ? 'pulsing' : ''}>
  {character.name} {isSpeechActive && 'is speaking...'}
</div>

Purpose: Track call connection statusCall states:

export enum CALL_STATUS {
  INACTIVE = 'inactive',  // Not connected
  ACTIVE = 'active',      // Connected and ready
  LOADING = 'loading',    // Connecting/disconnecting
}

Usage:

const [callStatus, setCallStatus] = useState(CALL_STATUS.INACTIVE)

vapi.on('call-start', () => setCallStatus(CALL_STATUS.ACTIVE))
vapi.on('call-end', () => setCallStatus(CALL_STATUS.INACTIVE))

// Conditional rendering
{callStatus === CALL_STATUS.ACTIVE && <LiveIndicator />}
{callStatus === CALL_STATUS.INACTIVE && <StartButton />}

Purpose: Real-time audio amplitude for visualizationsValue: Number (0-1) representing current audio levelUse cases:

Siri-style waveform animations
Audio meter displays
Visual feedback during speech

const [audioLevel, setAudioLevel] = useState(0)

vapi.on('volume-level', (volume: number) => {
  setAudioLevel(volume)
})

// Visual component
<Siri audioLevel={audioLevel} callStatus={callStatus} />

Purpose: Receive transcription and conversation messagesMessage types:

export enum MessageTypeEnum {
  TRANSCRIPT = 'transcript',
  FUNCTION_CALL = 'function-call',
  FUNCTION_CALL_RESULT = 'function-call-result',
  TOOL_CALLS = 'tool-calls',
}

export enum TranscriptMessageTypeEnum {
  PARTIAL = 'partial',  // Real-time (user still speaking)
  FINAL = 'final',      // Completed message
}

Handling transcripts:

const onMessageUpdate = (message: Message) => {
  if (message.type === MessageTypeEnum.TRANSCRIPT) {
    if (message.transcriptType === TranscriptMessageTypeEnum.PARTIAL) {
      // Show live transcription
      setActiveTranscript(message)
    } else {
      // Add to message history
      setMessages(prev => [...prev, message])
      setActiveTranscript(null)
    }
  }
}

Call Control Methods

Starting a Call

features/call/hooks/useVapi.ts

const start = async () => {
  setCallStatus(CALL_STATUS.LOADING)
  vapi.start(assistant as CreateAssistantDTO)
}

// In component
useEffect(() => {
  start() // Auto-start call on mount
}, [])

Stopping a Call

const stop = () => {
  setCallStatus(CALL_STATUS.LOADING)
  vapi.stop()
}

Toggle Call State

const toggleCall = () => {
  if (callStatus === CALL_STATUS.ACTIVE) {
    stop()
  } else {
    start()
  }
}

// Usage
<AssistantButton onClick={toggleCall} callStatus={callStatus} />

Programmatic Messages

Send system messages to the assistant during a call:

features/call/hooks/useVapi.ts

const askQuestion = (question: string) => {
  console.log('question:', question)
  vapi.send({
    type: 'add-message',
    message: {
      role: 'system',
      content: `The user has pressed a button for you to ask him ${question}.`,
    },
  })
}

// Example: Helper text options
const handleOptionClick = (option: string) => {
  switch (option) {
    case 'Ask a question':
      askQuestion("If you've a question? feel free to ask")
      break
    case 'Request Explanation':
      askQuestion("I'd be happy to explain anything about me or my work.")
      break
    case 'Topic Discussion':
      const topics = character.notableWork.split(',').slice(0, 2)
      askQuestion(`Discuss these topics: ${topics.join(', ')}`)
      break
  }
}

Real-Time Transcription

Message Types

types/conversation.type.ts

export interface TranscriptMessage extends BaseMessage {
  type: MessageTypeEnum.TRANSCRIPT
  role: MessageRoleEnum  // 'user' | 'assistant' | 'system'
  transcriptType: TranscriptMessageTypeEnum  // 'partial' | 'final'
  transcript: string
}

export enum MessageRoleEnum {
  USER = 'user',
  SYSTEM = 'system',
  ASSISTANT = 'assistant',
}

Displaying Transcripts

features/call/components/chat-view.tsx

'use client'

import { useVapi } from '../hooks/useVapi'
import { TranscriptView } from './TranscriptView'

function ChatView({ character, userImage, systemPrompt, firstMessage }) {
  const { messages, activeTranscript } = useVapi({
    character,
    systemPrompt,
    firstMessage
  })

  return (
    <TranscriptView
      messages={messages}
      activeTranscript={activeTranscript}
      character={character}
      userImage={userImage}
    />
  )
}

TranscriptView component:

Renders completed messages from messages array
Shows live activeTranscript with visual indicator (e.g., typing dots)
Distinguishes between user and assistant messages
Auto-scrolls to latest message

Call Interface Example

Complete implementation of a voice call UI:

features/call/components/call-interface.tsx

'use client'

import { useEffect, useState } from 'react'
import { useVapi } from '../hooks/useVapi'
import { CALL_STATUS } from '../types'

export const CallInterface = ({
  character,
  systemPrompt,
  firstMessage,
  backHref
}) => {
  const [callDuration, setCallDuration] = useState(0)
  const { toggleCall, callStatus, audioLevel, messages } = useVapi({
    character,
    systemPrompt,
    firstMessage
  })

  // Auto-start call
  useEffect(() => {
    toggleCall()
  }, [])

  // Track duration
  useEffect(() => {
    let interval: NodeJS.Timeout
    if (callStatus === CALL_STATUS.ACTIVE) {
      interval = setInterval(() => {
        setCallDuration(prev => prev + 1)
      }, 1000)
    }
    return () => clearInterval(interval)
  }, [callStatus])

  const formatTime = (seconds: number) => {
    const mins = Math.floor(seconds / 60)
    const secs = seconds % 60
    return `${mins.toString().padStart(2, '0')}:${secs.toString().padStart(2, '0')}`
  }

  return (
    <div className="min-h-screen">
      {/* Character Info */}
      <Avatar className="w-32 h-32">
        <AvatarImage src={character.imageUrl} alt={character.name} />
      </Avatar>
      <h1>{character.name}</h1>

      {/* Call Status */}
      {callStatus === CALL_STATUS.LOADING && <p>Connecting...</p>}
      {callStatus === CALL_STATUS.ACTIVE && (
        <>
          <p>Connected</p>
          <p>{formatTime(callDuration)}</p>
        </>
      )}
      {callStatus === CALL_STATUS.INACTIVE && <p>Call ended</p>}

      {/* Audio Visualization */}
      <Siri theme="ios9" audioLevel={audioLevel} callStatus={callStatus} />

      {/* Controls */}
      <AssistantButton onClick={toggleCall} callStatus={callStatus} />
    </div>
  )
}

Webhook Configuration

For server-side message processing, configure a webhook endpoint:

app/api/webhook/route.ts

import { NextRequest, NextResponse } from 'next/server'

export async function POST(request: NextRequest) {
  const body = await request.json()

  // Handle different message types
  switch (body.message?.type) {
    case 'function-call':
      // Execute server-side function
      const result = await handleFunctionCall(body.message.functionCall)
      return NextResponse.json({ result })

    case 'end-of-call-report':
      // Save conversation data
      await saveCallReport(body)
      return NextResponse.json({ success: true })

    default:
      return NextResponse.json({ received: true })
  }
}

Best Practices

Event Cleanup

Always clean up event listeners to prevent memory leaks:

useEffect(() => {
  const handler = () => { /* ... */ }
  vapi.on('event', handler)

  return () => {
    vapi.off('event', handler)  // Critical!
  }
}, [])

Missing cleanup causes:

Duplicate event handlers
Memory leaks
Stale state references

Call State Management

Use LOADING state during transitions:

const start = async () => {
  setCallStatus(CALL_STATUS.LOADING)  // Disable UI
  await vapi.start(assistant)
  // Status updates to ACTIVE via 'call-start' event
}

Prevents:

Multiple simultaneous calls
Button spam
Race conditions

Audio Level Throttling

Throttle volume-level updates for performance:

import { throttle } from 'lodash'

const onVolumeLevel = throttle((volume: number) => {
  setAudioLevel(volume)
}, 50)  // Max 20 updates/second

vapi.on('volume-level', onVolumeLevel)

Volume events fire very frequently; throttling prevents excessive re-renders.

Troubleshooting

No Audio Output
Transcription Issues
Call Won't Start

Symptoms: Call connects but no voice heardChecklist:

Verify voiceId is valid ElevenLabs voice
Check browser audio permissions
Ensure NEXT_PUBLIC_VAPI_WEB_TOKEN is set
Test with different voice settings (adjust stability/speed)
Check browser console for WebRTC errors

Debug code:

vapi.on('call-start', () => {
  console.log('Call started, checking audio...')
})
vapi.on('speech-start', () => {
  console.log('Speech started - audio should play')
})

Symptoms: Messages not appearing or delayedSolutions:

Check message type filtering:

const onMessageUpdate = (message: Message) => {
  console.log('Message type:', message.type)
  if (message.type === MessageTypeEnum.TRANSCRIPT) {
    // Handle transcript
  }
}

Verify partial vs final handling
Check for duplicate event listeners
Ensure state updates are not being batched incorrectly

Symptoms: start() called but no ‘call-start’ eventDebugging:

const start = async () => {
  console.log('Starting call with config:', assistant)
  setCallStatus(CALL_STATUS.LOADING)

  try {
    const response = await vapi.start(assistant)
    console.log('Start response:', response)
  } catch (error) {
    console.error('Start failed:', error)
    setCallStatus(CALL_STATUS.INACTIVE)
  }
}

Common causes:

Invalid API token
Missing required assistant fields
Network connectivity issues
Browser blocking WebRTC

Prompt Engineering - Crafting effective system prompts
Feature Structure - How call feature is organized

Setup

Core Concepts

Deployment

Overview

Vapi SDK Setup

useVapi Hook

Hook Interface

Assistant Configuration

Event Handling

Event Listeners Setup

Event Types Reference

Call Control Methods

Starting a Call

Stopping a Call

Toggle Call State

Programmatic Messages

Real-Time Transcription

Message Types

Displaying Transcripts

Call Interface Example

Webhook Configuration

Best Practices

Troubleshooting

Build docs developers (and LLMs) love

Setup

Core Concepts

Deployment

​Overview

​Vapi SDK Setup

​useVapi Hook

​Hook Interface

​Assistant Configuration

​Event Handling

​Event Listeners Setup

​Event Types Reference

​Call Control Methods

​Starting a Call

​Stopping a Call

​Toggle Call State

​Programmatic Messages

​Real-Time Transcription

​Message Types

​Displaying Transcripts

​Call Interface Example

​Webhook Configuration

​Best Practices

​Troubleshooting

​Related Documentation

Build docs developers (and LLMs) love

Overview

Vapi SDK Setup

useVapi Hook

Hook Interface

Assistant Configuration

Event Handling

Event Listeners Setup

Event Types Reference

Call Control Methods

Starting a Call

Stopping a Call

Toggle Call State

Programmatic Messages

Real-Time Transcription

Message Types

Displaying Transcripts

Call Interface Example

Webhook Configuration

Best Practices

Troubleshooting

Related Documentation