Voice Narration with ElevenLabs

Drift uses ElevenLabs to transform simulation results into engaging audio briefings, making financial data accessible through natural voice narration.

Features

Automatic voice selection based on your success probability
Natural number pronunciation ($1.5M → “one point five million dollars”)
Streaming & buffered playback for instant or progressive loading
Speech-to-text transcription for voice input
6 professional voices with distinct personalities

Voice Options

From /home/daytona/workspace/source/apps/api/src/services/elevenLabsService.ts:5-12:

const VOICE_OPTIONS = {
  josh: 'TxGEqnHWrfWFTfGW9XjX',   // Josh - friendly, energetic (DEFAULT)
  adam: 'pNInz6obpgDQGcFmaJgB',   // Adam - deep, authoritative
  rachel: '21m00Tcm4TlvDq8ikWAM', // Rachel - warm, professional
  bella: 'EXAVITQu4vr4xnSDxMaL',  // Bella - soft, reassuring
  antoni: 'ErXwobaYiN019PkySvjV', // Antoni - confident, punchy
  domi: 'AZnzlk1XvdvUeBnXmlld',   // Domi - strong, bold
}

Dynamic Voice Selection

The service automatically selects voices based on simulation outcomes: From /home/daytona/workspace/source/apps/api/src/services/elevenLabsService.ts:186-197:

selectVoiceByOutcome(successProbability: number): VoiceName {
  if (successProbability >= 0.75) {
    // Great news! Excited, celebratory
    return 'josh'
  } else if (successProbability >= 0.50) {
    // Decent odds, encouraging and confident
    return 'adam'
  } else {
    // Tough situation, empathetic and supportive
    return 'bella'
  }
}

Result:

≥75% success: Josh (energetic, celebratory)
50-75% success: Adam (confident, encouraging)
Below 50% success: Bella (empathetic, supportive)

Number-to-Words Conversion

ElevenLabs pronounces formatted numbers more naturally when converted to words: From /home/daytona/workspace/source/apps/api/src/services/elevenLabsService.ts:199-246:

private numbersToWords(text: string): string {
  let result = text
  
  // Handle full words: "$30 million", "$30 billion", "$30 thousand"
  result = result.replace(/\$(\d+(?:\.\d+)?)\s*billion/gi, (_, num) => {
    return this.numberToSpoken(parseFloat(num), 'billion')
  })
  result = result.replace(/\$(\d+(?:\.\d+)?)\s*million/gi, (_, num) => {
    return this.numberToSpoken(parseFloat(num), 'million')
  })
  
  // Handle abbreviations: $1.5M, $25K, $100B
  result = result.replace(/\$(\d+(?:\.\d+)?)\s*M(?![a-z])/gi, (_, num) => {
    return this.numberToSpoken(parseFloat(num), 'million')
  })
  result = result.replace(/\$(\d+(?:\.\d+)?)\s*K(?![a-z])/gi, (_, num) => {
    return this.numberToSpoken(parseFloat(num), 'thousand')
  })
  
  // Plain $XX,XXX patterns (with commas)
  result = result.replace(/\$(\d{1,3}(?:,\d{3})+)/g, (_, num) => {
    const value = parseInt(num.replace(/,/g, ''))
    return this.dollarAmountToSpoken(value)
  })
  
  // Percentages: 73% → "seventy-three percent"
  result = result.replace(/(\d+(?:\.\d+)?)\s*%/g, (_, num) => {
    const value = parseFloat(num)
    if (Number.isInteger(value)) {
      return `${this.intToWords(value)} percent`
    }
    return `${value} percent`
  })
  
  return result
}

Conversion Examples

Input	Output (Spoken)
`$1.5M`	”one point five million dollars”
`$25,000`	”twenty-five thousand dollars”
`$100B`	”one hundred billion dollars”
`73%`	”seventy-three percent”
`$500K`	”five hundred thousand dollars”

From /home/daytona/workspace/source/apps/api/src/services/elevenLabsService.ts:280-296:

private intToWords(num: number): string {
  if (num === 0) return 'zero'
  
  const ones = ['', 'one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine',
    'ten', 'eleven', 'twelve', 'thirteen', 'fourteen', 'fifteen', 'sixteen', 'seventeen', 'eighteen', 'nineteen']
  const tens = ['', '', 'twenty', 'thirty', 'forty', 'fifty', 'sixty', 'seventy', 'eighty', 'ninety']
  
  if (num < 20) return ones[num]
  if (num < 100) {
    return tens[Math.floor(num / 10)] + (num % 10 ? '-' + ones[num % 10] : '')
  }
  if (num < 1000) {
    return ones[Math.floor(num / 100)] + ' hundred' + (num % 100 ? ' ' + this.intToWords(num % 100) : '')
  }
  return num.toString()  // TTS handles larger numbers
}

Voice Settings

From /home/daytona/workspace/source/apps/api/src/services/elevenLabsService.ts:52-61:

const audioStream = await this.client!.textToSpeech.convert(voiceId, {
  text: spokenText,
  model_id: options?.modelId || 'eleven_multilingual_v2',
  voice_settings: {
    stability: 0.25,       // Lower = more expressive/dynamic
    similarity_boost: 0.85,
    style: 0.8,            // Higher = more stylized/exciting
    use_speaker_boost: true,
  },
})

Parameters:

stability (0.25): Low for expressive, energetic delivery
similarity_boost (0.85): High for voice consistency
style (0.8): High for engaging, exciting narration
use_speaker_boost: Enhances clarity and presence

These settings optimize for financial briefings where clarity and engagement are critical. Adjust stability higher (0.5-0.75) for more formal, calm delivery.

Audio Generation

Buffered Audio

Generate complete audio buffer for download or playback: From /home/daytona/workspace/source/apps/api/src/services/elevenLabsService.ts:32-74:

async generateAudio(
  text: string,
  options?: {
    voice?: VoiceName
    modelId?: string
  }
): Promise<Buffer> {
  if (!this.client) {
    throw new Error('ElevenLabs API key not configured')
  }
  
  const voiceId = options?.voice
    ? VOICE_OPTIONS[options.voice]
    : this.defaultVoice
  
  // Convert numbers to spoken words before sending to TTS
  const spokenText = this.numbersToWords(text)
  
  const audioStream = await this.client!.textToSpeech.convert(voiceId, {
    text: spokenText,
    model_id: options?.modelId || 'eleven_multilingual_v2',
    voice_settings: { /* ... */ },
  })
  
  // Convert the stream to a buffer
  const chunks: Buffer[] = []
  for await (const chunk of audioStream) {
    chunks.push(Buffer.from(chunk))
  }
  
  return Buffer.concat(chunks)
}

Streaming Audio

Stream audio progressively for instant playback: From /home/daytona/workspace/source/apps/api/src/services/elevenLabsService.ts:76-129:

async generateAudioStream(
  text: string,
  options?: {
    voice?: VoiceName
    modelId?: string
  }
): Promise<Readable> {
  const spokenText = this.numbersToWords(text)
  
  const audioStream = await this.client!.textToSpeech.convertAsStream(voiceId, {
    text: spokenText,
    model_id: options?.modelId || 'eleven_multilingual_v2',
    voice_settings: { /* ... */ },
  })
  
  // Convert AsyncIterable to Node.js Readable stream
  const readable = new Readable({
    read() {},
  })
  
  ;(async () => {
    try {
      for await (const chunk of audioStream) {
        readable.push(Buffer.from(chunk))
      }
      readable.push(null) // Signal end of stream
    } catch (error) {
      readable.destroy(error as Error)
    }
  })()
  
  return readable
}

Speech-to-Text (Transcription)

Transcribe audio input for voice-based goal entry: From /home/daytona/workspace/source/apps/api/src/services/elevenLabsService.ts:135-176:

async transcribeAudio(audioBuffer: Buffer): Promise<string> {
  if (!this.client) {
    throw new Error('ElevenLabs API key not configured')
  }
  
  const fs = await import('fs')
  const os = await import('os')
  const path = await import('path')
  
  // Write buffer to temp file
  const tempPath = path.join(os.tmpdir(), `audio-${Date.now()}.webm`)
  
  try {
    fs.writeFileSync(tempPath, audioBuffer)
    const fileStream = fs.createReadStream(tempPath)
    
    const result = await this.client.speechToText.convert({
      file: fileStream,
      model_id: 'scribe_v1',
    })
    
    if (result && result.text) {
      return result.text
    }
    
    throw new Error('No transcript returned')
  } finally {
    // Clean up temp file
    try {
      fs.unlinkSync(tempPath)
    } catch {}
  }
}

Frontend Integration

The Narration component provides a complete audio player UI: From /home/daytona/workspace/source/apps/web/components/Narration.tsx:64-135:

const fetchBriefing = async (autoPlay: boolean = false) => {
  setIsLoading(true)
  setError(null)
  
  try {
    const request: NarrativeRequest = {
      simulationResults: {
        successProbability: simulationResults.successProbability,
        medianOutcome: simulationResults.medianOutcome,
        percentiles: simulationResults.percentiles,
        // ...
      },
      financialProfile,
      goal,
    }
    
    const response: BriefingResponse = await generateBriefing(request)
    
    setNarrative(response.narrative)
    setAudioAvailable(response.audioAvailable)
    
    if (response.audioAvailable && response.audio) {
      setAudioData(response.audio)
      
      // Create audio element
      const audio = new Audio(`data:audio/mpeg;base64,${response.audio}`)
      audioRef.current = audio
      
      audio.onloadedmetadata = () => {
        setDuration(audio.duration)
      }
      
      audio.onended = () => {
        setIsPlaying(false)
        setProgress(0)
      }
      
      // Auto-play if requested
      if (autoPlay) {
        audio.oncanplaythrough = async () => {
          await audio.play()
          setIsPlaying(true)
          progressIntervalRef.current = setInterval(() => {
            if (audioRef.current) {
              setProgress(audioRef.current.currentTime)
            }
          }, 100)
        }
      }
    }
  } catch (err) {
    console.error('Failed to fetch briefing:', err)
    setError('Failed to generate your financial briefing. Please try again.')
  } finally {
    setIsLoading(false)
  }
}

Player Controls

Play/Pause: Toggle playback
Progress bar: Click to seek to any position
Volume control: Mute/unmute
Transcript: Full text display below player

Setup

Environment Variables

ELEVENLABS_API_KEY=your_api_key_here
ELEVENLABS_VOICE_ID=josh  # Optional: override default voice

Get your API key from: https://elevenlabs.io/app/settings/api-keys

Configuration Check

From /home/daytona/workspace/source/apps/api/src/services/elevenLabsService.ts:131-133:

isConfigured(): boolean {
  return !!process.env.ELEVENLABS_API_KEY
}

The service gracefully degrades when credentials are missing:

if (!elevenLabsService.isConfigured()) {
  // Return text-only response
  return {
    narrative: generatedText,
    audioAvailable: false
  }
}

// Generate audio
const audioBuffer = await elevenLabsService.generateAudio(generatedText)
return {
  narrative: generatedText,
  audio: audioBuffer.toString('base64'),
  audioAvailable: true
}

Example Narrative

Input (Simulation Results):

{
  "successProbability": 0.73,
  "medianOutcome": 520000,
  "targetAmount": 500000,
  "timelineMonths": 180
}

Generated Text:

Great news! Based on 100,000 simulations of your financial future, 
you have a 73% chance of reaching your $500K retirement goal in 15 years.

Your most likely outcome is $520,000, comfortably above your target. 
In the worst 10% of scenarios, you'd still have $380,000, 
while the best 10% could see you reach $720,000 or more.

To maintain these strong odds, stay consistent with your current savings rate 
and consider small spending adjustments if market conditions change.

Audio Output:

Voice: Josh (success ≥75%)
Duration: ~25 seconds
Format: MP3, base64-encoded

Narrative text is generated by the LLM service (Gemini or GPT) before being converted to speech. See the AI service documentation for customization.

API Routes

// Generate briefing with audio
POST /api/ai/briefing
{
  "simulationResults": { /* SimulationResults */ },
  "financialProfile": { /* FinancialProfile */ },
  "goal": { /* Goal */ }
}

// Response
{
  "narrative": "Great news! Based on 100,000 simulations...",
  "audio": "base64_encoded_mp3_data",
  "audioAvailable": true
}

Best Practices

Keep narratives under 60 seconds for better engagement
Use simple language for clearer pronunciation
Round large numbers ( $1.2M instead of$ 1,234,567)
Cache generated audio to reduce API costs
Provide text fallback when audio fails

ElevenLabs charges per character (~

0.30 per 1,000 characters). A typical 250-word briefing costs ~

0.40. Monitor usage at https://elevenlabs.io/app/usage

Next Steps

Simulations

Understand the data behind the narration

Sensitivity Analysis

Narrate what-if scenarios

Get Started

Core Concepts

Features

Guides

Architecture

Voice Narration with ElevenLabs

Features

Voice Options

Dynamic Voice Selection

Number-to-Words Conversion

Conversion Examples

Voice Settings

Audio Generation

Buffered Audio

Streaming Audio

Speech-to-Text (Transcription)

Frontend Integration

Player Controls

Setup

Environment Variables

Configuration Check

Example Narrative

API Routes

Best Practices

Next Steps

Simulations

Sensitivity Analysis

Build docs developers (and LLMs) love

Get Started

Core Concepts

Features

Guides

Architecture

​Features

​Voice Options

​Dynamic Voice Selection

​Number-to-Words Conversion

​Conversion Examples

​Voice Settings

​Audio Generation

​Buffered Audio

​Streaming Audio

​Speech-to-Text (Transcription)

​Frontend Integration

​Player Controls

​Setup

​Environment Variables

​Configuration Check

​Example Narrative

​API Routes

​Best Practices

​Next Steps

Simulations

Sensitivity Analysis

Build docs developers (and LLMs) love

Features

Voice Options

Dynamic Voice Selection

Number-to-Words Conversion

Conversion Examples

Voice Settings

Audio Generation

Buffered Audio

Streaming Audio

Speech-to-Text (Transcription)

Frontend Integration

Player Controls

Setup

Environment Variables

Configuration Check

Example Narrative

API Routes

Best Practices

Next Steps