Skip to main content

Overview

Echoes of the Past uses Vapi for real-time voice conversations with historical figures. The integration includes:
  • Voice synthesis via ElevenLabs
  • Speech recognition for user input
  • Real-time transcription and message streaming
  • Audio level monitoring for visual feedback
  • Assistant configuration with OpenAI models

Vapi SDK Setup

The Vapi client is initialized in lib/vapi.ts:
lib/vapi.ts
import Vapi from '@vapi-ai/web'

if (!process.env.NEXT_PUBLIC_VAPI_WEB_TOKEN) {
  throw new Error('NEXT_PUBLIC_VAPI_WEB_TOKEN environment variable is required')
}

export const vapi = new Vapi(process.env.NEXT_PUBLIC_VAPI_WEB_TOKEN)
Environment setup:
.env.local
NEXT_PUBLIC_VAPI_WEB_TOKEN=your_vapi_token_here
NEXT_PUBLIC_SERVER_URL=https://your-domain.com/api/webhook

useVapi Hook

The useVapi hook encapsulates all Vapi integration logic:

Hook Interface

features/call/hooks/useVapi.ts
import { useEffect, useState } from 'react'
import { HistoricalFigure } from '@/types'
import { vapi } from '@/lib/vapi'
import { Message, TranscriptMessage } from '@/types/conversation.type'

type UseVapiProps = {
  character: HistoricalFigure
  systemPrompt: string
  firstMessage: string
}

export function useVapi({ character, systemPrompt, firstMessage }: UseVapiProps) {
  const [isSpeechActive, setIsSpeechActive] = useState(false)
  const [callStatus, setCallStatus] = useState<CALL_STATUS>(CALL_STATUS.INACTIVE)
  const [messages, setMessages] = useState<Message[]>([])
  const [activeTranscript, setActiveTranscript] = useState<TranscriptMessage | null>(null)
  const [audioLevel, setAudioLevel] = useState(0)

  // ... implementation

  return {
    isSpeechActive,
    callStatus,
    audioLevel,
    activeTranscript,
    messages,
    start,
    stop,
    toggleCall,
    askQuestion,
  }
}

Assistant Configuration

The assistant object defines the AI’s behavior, voice, and model:
features/call/hooks/useVapi.ts
const assistant: Omit<CreateAssistantDTO, 'clientMessages' | 'serverMessages'> = {
  name: character.name,
  firstMessage,
  
  // AI Model Configuration
  model: {
    provider: 'openai',
    model: 'gpt-3.5-turbo',
    temperature: 0.7,
    messages: [
      {
        role: 'system',
        content: systemPrompt,
      },
    ],
  },
  
  // Idle Message Handling
  messagePlan: {
    idleMessages: [
      'If you have a question, feel free to ask',
      'Are you there?',
      'What are you thinking? I can help you!!',
      "I'm here whenever you're ready to continue"
    ],
    idleTimeoutSeconds: 15,
    idleMessageMaxSpokenCount: 3,
    idleMessageResetCountOnUserSpeechEnabled: true,
  },
  
  // Audio Processing
  backgroundDenoisingEnabled: true,
  
  // Voice Configuration (ElevenLabs)
  voice: {
    provider: '11labs',
    voiceId: character.voiceId,
    stability: 0.4,
    similarityBoost: 0.8,
    speed: 1,
    style: 0.5,
    useSpeakerBoost: true,
  },
  
  // Server Webhook
  server: {
    url: process.env.NEXT_PUBLIC_SERVER_URL || 
      'https://your-backend.ngrok-free.app/api/webhook',
  },
}
ElevenLabs voice settings:
  • voiceId: Character-specific voice from ElevenLabs
  • stability: 0-1, higher = more consistent but less expressive (0.4 recommended)
  • similarityBoost: 0-1, higher = closer to original voice (0.8 recommended)
  • speed: Playback speed multiplier (1 = normal)
  • style: 0-1, voice style intensity (0.5 = balanced)
  • useSpeakerBoost: Enhances clarity for different speakers
Tuning tips:
  • Lower stability (0.3-0.5) for more dynamic conversations
  • Higher stability (0.6-0.8) for formal or educational content
  • Adjust speed based on character personality (slower for contemplative figures)
Idle message behavior:
messagePlan: {
  idleMessages: ['Are you there?', 'Feel free to ask'],
  idleTimeoutSeconds: 15,          // Wait 15s before idle message
  idleMessageMaxSpokenCount: 3,    // Stop after 3 idle messages
  idleMessageResetCountOnUserSpeechEnabled: true,
}
Purpose: Keeps conversation engaging without being annoying. After user speaks, idle counter resets.

Event Handling

Vapi emits events throughout the call lifecycle:

Event Listeners Setup

features/call/hooks/useVapi.ts
useEffect(() => {
  // Speech Events
  const onSpeechStart = () => setIsSpeechActive(true)
  const onSpeechEnd = () => {
    console.log('Speech has ended')
    setIsSpeechActive(false)
  }

  // Call Lifecycle
  const onCallStartHandler = () => {
    console.log('Call has started')
    setCallStatus(CALL_STATUS.ACTIVE)
  }

  const onCallEnd = () => {
    console.log('Call has stopped')
    setCallStatus(CALL_STATUS.INACTIVE)
  }

  // Audio Level for Visualization
  const onVolumeLevel = (volume: number) => {
    setAudioLevel(volume)
  }

  // Message Updates (Transcription)
  const onMessageUpdate = (message: Message) => {
    if (message.type === MessageTypeEnum.TRANSCRIPT && 
        message.transcriptType === TranscriptMessageTypeEnum.PARTIAL) {
      // Live transcription (user still speaking)
      setActiveTranscript(message)
    } else {
      // Final message (completed)
      setMessages((prev) => [...prev, message])
      setActiveTranscript(null)
    }
  }

  // Error Handling
  const onError = (e: Error) => {
    setCallStatus(CALL_STATUS.INACTIVE)
    console.error(e)
  }

  // Register all listeners
  vapi.on('speech-start', onSpeechStart)
  vapi.on('speech-end', onSpeechEnd)
  vapi.on('call-start', onCallStartHandler)
  vapi.on('call-end', onCallEnd)
  vapi.on('volume-level', onVolumeLevel)
  vapi.on('message', onMessageUpdate)
  vapi.on('error', onError)

  // Cleanup on unmount
  return () => {
    vapi.off('speech-start', onSpeechStart)
    vapi.off('speech-end', onSpeechEnd)
    vapi.off('call-start', onCallStartHandler)
    vapi.off('call-end', onCallEnd)
    vapi.off('volume-level', onVolumeLevel)
    vapi.off('message', onMessageUpdate)
    vapi.off('error', onError)
  }
}, [])

Event Types Reference

Purpose: Track when AI assistant is speakingUse cases:
  • Show “speaking” indicator
  • Animate avatar
  • Disable input during AI speech
const [isSpeechActive, setIsSpeechActive] = useState(false)

vapi.on('speech-start', () => setIsSpeechActive(true))
vapi.on('speech-end', () => setIsSpeechActive(false))

// In component
<div className={isSpeechActive ? 'pulsing' : ''}>
  {character.name} {isSpeechActive && 'is speaking...'}
</div>

Call Control Methods

Starting a Call

features/call/hooks/useVapi.ts
const start = async () => {
  setCallStatus(CALL_STATUS.LOADING)
  vapi.start(assistant as CreateAssistantDTO)
}

// In component
useEffect(() => {
  start() // Auto-start call on mount
}, [])

Stopping a Call

const stop = () => {
  setCallStatus(CALL_STATUS.LOADING)
  vapi.stop()
}

Toggle Call State

const toggleCall = () => {
  if (callStatus === CALL_STATUS.ACTIVE) {
    stop()
  } else {
    start()
  }
}

// Usage
<AssistantButton onClick={toggleCall} callStatus={callStatus} />

Programmatic Messages

Send system messages to the assistant during a call:
features/call/hooks/useVapi.ts
const askQuestion = (question: string) => {
  console.log('question:', question)
  vapi.send({
    type: 'add-message',
    message: {
      role: 'system',
      content: `The user has pressed a button for you to ask him ${question}.`,
    },
  })
}

// Example: Helper text options
const handleOptionClick = (option: string) => {
  switch (option) {
    case 'Ask a question':
      askQuestion("If you've a question? feel free to ask")
      break
    case 'Request Explanation':
      askQuestion("I'd be happy to explain anything about me or my work.")
      break
    case 'Topic Discussion':
      const topics = character.notableWork.split(',').slice(0, 2)
      askQuestion(`Discuss these topics: ${topics.join(', ')}`)
      break
  }
}

Real-Time Transcription

Message Types

types/conversation.type.ts
export interface TranscriptMessage extends BaseMessage {
  type: MessageTypeEnum.TRANSCRIPT
  role: MessageRoleEnum  // 'user' | 'assistant' | 'system'
  transcriptType: TranscriptMessageTypeEnum  // 'partial' | 'final'
  transcript: string
}

export enum MessageRoleEnum {
  USER = 'user',
  SYSTEM = 'system',
  ASSISTANT = 'assistant',
}

Displaying Transcripts

features/call/components/chat-view.tsx
'use client'

import { useVapi } from '../hooks/useVapi'
import { TranscriptView } from './TranscriptView'

function ChatView({ character, userImage, systemPrompt, firstMessage }) {
  const { messages, activeTranscript } = useVapi({
    character,
    systemPrompt,
    firstMessage
  })

  return (
    <TranscriptView
      messages={messages}
      activeTranscript={activeTranscript}
      character={character}
      userImage={userImage}
    />
  )
}
TranscriptView component:
  • Renders completed messages from messages array
  • Shows live activeTranscript with visual indicator (e.g., typing dots)
  • Distinguishes between user and assistant messages
  • Auto-scrolls to latest message

Call Interface Example

Complete implementation of a voice call UI:
features/call/components/call-interface.tsx
'use client'

import { useEffect, useState } from 'react'
import { useVapi } from '../hooks/useVapi'
import { CALL_STATUS } from '../types'

export const CallInterface = ({
  character,
  systemPrompt,
  firstMessage,
  backHref
}) => {
  const [callDuration, setCallDuration] = useState(0)
  const { toggleCall, callStatus, audioLevel, messages } = useVapi({
    character,
    systemPrompt,
    firstMessage
  })

  // Auto-start call
  useEffect(() => {
    toggleCall()
  }, [])

  // Track duration
  useEffect(() => {
    let interval: NodeJS.Timeout
    if (callStatus === CALL_STATUS.ACTIVE) {
      interval = setInterval(() => {
        setCallDuration(prev => prev + 1)
      }, 1000)
    }
    return () => clearInterval(interval)
  }, [callStatus])

  const formatTime = (seconds: number) => {
    const mins = Math.floor(seconds / 60)
    const secs = seconds % 60
    return `${mins.toString().padStart(2, '0')}:${secs.toString().padStart(2, '0')}`
  }

  return (
    <div className="min-h-screen">
      {/* Character Info */}
      <Avatar className="w-32 h-32">
        <AvatarImage src={character.imageUrl} alt={character.name} />
      </Avatar>
      <h1>{character.name}</h1>

      {/* Call Status */}
      {callStatus === CALL_STATUS.LOADING && <p>Connecting...</p>}
      {callStatus === CALL_STATUS.ACTIVE && (
        <>
          <p>Connected</p>
          <p>{formatTime(callDuration)}</p>
        </>
      )}
      {callStatus === CALL_STATUS.INACTIVE && <p>Call ended</p>}

      {/* Audio Visualization */}
      <Siri theme="ios9" audioLevel={audioLevel} callStatus={callStatus} />

      {/* Controls */}
      <AssistantButton onClick={toggleCall} callStatus={callStatus} />
    </div>
  )
}

Webhook Configuration

For server-side message processing, configure a webhook endpoint:
app/api/webhook/route.ts
import { NextRequest, NextResponse } from 'next/server'

export async function POST(request: NextRequest) {
  const body = await request.json()

  // Handle different message types
  switch (body.message?.type) {
    case 'function-call':
      // Execute server-side function
      const result = await handleFunctionCall(body.message.functionCall)
      return NextResponse.json({ result })

    case 'end-of-call-report':
      // Save conversation data
      await saveCallReport(body)
      return NextResponse.json({ success: true })

    default:
      return NextResponse.json({ received: true })
  }
}

Best Practices

Always clean up event listeners to prevent memory leaks:
useEffect(() => {
  const handler = () => { /* ... */ }
  vapi.on('event', handler)

  return () => {
    vapi.off('event', handler)  // Critical!
  }
}, [])
Missing cleanup causes:
  • Duplicate event handlers
  • Memory leaks
  • Stale state references
Use LOADING state during transitions:
const start = async () => {
  setCallStatus(CALL_STATUS.LOADING)  // Disable UI
  await vapi.start(assistant)
  // Status updates to ACTIVE via 'call-start' event
}
Prevents:
  • Multiple simultaneous calls
  • Button spam
  • Race conditions
Throttle volume-level updates for performance:
import { throttle } from 'lodash'

const onVolumeLevel = throttle((volume: number) => {
  setAudioLevel(volume)
}, 50)  // Max 20 updates/second

vapi.on('volume-level', onVolumeLevel)
Volume events fire very frequently; throttling prevents excessive re-renders.

Troubleshooting

Symptoms: Call connects but no voice heardChecklist:
  1. Verify voiceId is valid ElevenLabs voice
  2. Check browser audio permissions
  3. Ensure NEXT_PUBLIC_VAPI_WEB_TOKEN is set
  4. Test with different voice settings (adjust stability/speed)
  5. Check browser console for WebRTC errors
Debug code:
vapi.on('call-start', () => {
  console.log('Call started, checking audio...')
})
vapi.on('speech-start', () => {
  console.log('Speech started - audio should play')
})

Build docs developers (and LLMs) love