Hume AI integration

SvaraAI leverages Hume AI’s Empathic Voice Interface (EVI) to enable real-time voice conversations with emotional intelligence. This integration provides both voice communication and emotion analysis capabilities.

Overview

The Hume AI integration consists of two main components:

Voice interface: Real-time voice conversations using Hume’s EVI
Emotion analysis: Prosody and language models that detect emotional states

Prerequisites

Before you begin, ensure you have:

A Hume AI account with API access
API key and secret key from your Hume AI dashboard
A configured EVI configuration ID

Installation

Install required packages

The frontend uses the official Hume AI React SDK:

npm install @humeai/voice-react hume

Configure environment variables

Add your Hume AI credentials to your environment:

VITE_HUME_API_KEY=your_api_key_here
VITE_HUME_CONFIG_ID=your_config_id_here

Keep your HUME_SECRET_KEY secure and never expose it in client-side code. It should only be used in backend services.

Frontend implementation

Setting up the voice provider

The VoiceProvider component wraps your chat interface to enable voice functionality:

Frontend/src/components/chat.tsx

import { VoiceProvider } from "@humeai/voice-react";
import Messages from "./message";
import Controls from "./controls";
import StartCall from "./startCall";

export default function ChatInterface() {
  const apiKey = import.meta.env.VITE_HUME_API_KEY || "";
  const configId = import.meta.env.VITE_HUME_CONFIG_ID || "";

  if (!apiKey || !configId) {
    return (
      <div className="flex items-center justify-center h-screen">
        <p className="text-red-500">Missing API credentials. Check your .env file.</p>
      </div>
    );
  }

  return (
    <div className="relative grow flex flex-col mx-auto w-full overflow-hidden h-screen">
      <VoiceProvider>
        <Messages />
        <Controls />
        <StartCall apiKey={apiKey} configId={configId} />
      </VoiceProvider>
    </div>
  );
}

Connecting to the voice interface

Use the useVoice hook to connect to Hume’s EVI:

Frontend/src/components/startCall.tsx

import { useVoice, type ConnectOptions } from "@humeai/voice-react";
import { useState } from "react";

interface StartCallProps {
  apiKey: string;
  configId: string;
}

export default function StartCall({ apiKey, configId }: StartCallProps) {
  const { status, connect } = useVoice();
  const [isConnecting, setIsConnecting] = useState(false);

  const handleConnect = async () => {
    if (status.value === "connected" || status.value === "connecting" || isConnecting) {
      return;
    }

    setIsConnecting(true);

    const connectOptions: ConnectOptions = {
      auth: { type: "apiKey", value: apiKey },
      configId: configId,
    };

    try {
      await connect(connectOptions);
    } catch {
      alert("Unable to connect. Please check microphone permissions and try again.");
    } finally {
      setIsConnecting(false);
    }
  };

  return (
    <button onClick={handleConnect} disabled={isConnecting || status.value === "connecting"}>
      Start call
    </button>
  );
}

Accessing conversation messages

The useVoice hook provides access to messages with emotional data:

Frontend/src/components/message.tsx

import { useVoice } from "@humeai/voice-react";
import Expressions from "./expressions";

export default function Messages() {
  const { messages } = useVoice();

  return (
    <div className="flex flex-col gap-4">
      {messages.map((msg, index) => {
        if (msg.type === "user_message" || msg.type === "assistant_message") {
          const isUser = msg.type === "user_message";

          return (
            <div key={index}>
              <div>
                <span>{msg.message.role}</span>
                <time>{msg.receivedAt.toLocaleTimeString()}</time>
              </div>
              
              <div>{msg.message.content}</div>

              {/* Display emotion scores */}
              {msg.models.prosody?.scores && (
                <Expressions values={{ ...msg.models.prosody.scores }} />
              )}
            </div>
          );
        }
        return null;
      })}
    </div>
  );
}

Each message includes a models property containing prosody scores that represent the emotional state detected in the user’s voice.

Managing the voice connection

Control microphone state and disconnect from the session:

Frontend/src/components/controls.tsx

import { useVoice } from "@humeai/voice-react";

export default function Controls() {
  const { disconnect, isMuted, unmute, mute, micFft } = useVoice();

  const handleEndCall = async () => {
    // Process conversation data before disconnecting
    disconnect?.();
  };

  return (
    <div>
      <button onClick={() => (isMuted ? unmute?.() : mute?.())}>
        {isMuted ? "Unmute" : "Mute"}
      </button>
      
      <button onClick={handleEndCall}>
        End call
      </button>
    </div>
  );
}

Backend implementation

OAuth2 authentication

The backend handles secure token generation using OAuth2 client credentials flow:

Backend/utils/humeClient.ts

export const getHumeAccessToken = async (): Promise<string> => {
  const apiKey = process.env.VITE_HUME_API_KEY;
  const secretKey = process.env.HUME_SECRET_KEY;

  if (!apiKey || !secretKey) {
    throw new Error(
      'Missing required environment variables (VITE_HUME_API_KEY or HUME_SECRET_KEY)'
    );
  }

  try {
    const authString = `${apiKey}:${secretKey}`;
    const encoded = Buffer.from(authString).toString('base64');
    
    const res = await fetch('https://api.hume.ai/oauth2-cc/token', {
      method: 'POST',
      headers: {
        'Content-Type': 'application/x-www-form-urlencoded',
        Authorization: `Basic ${encoded}`,
      },
      body: new URLSearchParams({ grant_type: 'client_credentials' }).toString(),
      cache: 'no-cache',
    });
    
    if (!res.ok) {
      throw new Error(`HTTP ${res.status}: ${res.statusText}`);
    }
    
    const raw = await res.json();
    if (!raw.access_token) {
      throw new Error('Unable to get access token: ' + JSON.stringify(raw));
    }
    
    return raw.access_token;
  } catch (err) {
    throw err;
  }
};

Batch audio analysis

Analyze pre-recorded audio files for emotional content:

Backend/routes/hume.ts

import express from 'express';
import { getHumeAccessToken } from '../utils/humeClient';

const router = express.Router();

router.post('/', async (req, res): Promise<any> => {
  const { audioUrl } = req.body;
  
  if (!audioUrl) {
    return res.status(400).json({ error: 'audioUrl is required' });
  }

  try {
    new URL(audioUrl);
  } catch {
    return res.status(400).json({ error: 'Invalid URL format' });
  }

  try {
    const token = await getHumeAccessToken();
    
    const response = await fetch('https://api.hume.ai/v0/batch/analyze-url', {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${token}`,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({
        url: audioUrl,
        models: {
          language: {},
          prosody: {}
        },
      }),
    });
    
    if (!response.ok) {
      throw new Error(`Hume API request failed: ${response.status}`);
    }
    
    const result = await response.json();
    res.json(result);
  } catch (err) {
    console.log(err);
    res.status(500).json({ error: "analysis failed" });
  }
});

export default router;

Extracting emotion data

Collect and aggregate emotional data from conversation messages:

const validMessages = messages.filter(
  (msg) => msg.type === "user_message" || msg.type === "assistant_message"
);

let emotions: Record<string, number> = {};

const userMessages = validMessages.filter((msg) => msg.type === "user_message");

userMessages.forEach((msg) => {
  if ("models" in msg && msg.models?.prosody?.scores) {
    const scores = msg.models.prosody.scores;
    Object.entries(scores).forEach(([emotion, score]) => {
      emotions[emotion] = (emotions[emotion] || 0) + (score as number);
    });
  }
});

// Calculate average scores
if (userMessages.length > 0) {
  Object.keys(emotions).forEach((key) => {
    emotions[key] = emotions[key] / userMessages.length;
  });
}

API reference

useVoice hook

The primary hook for interacting with Hume’s voice interface:

Property	Type	Description
`status`	`object`	Connection status with `value` property
`connect`	`function`	Initiates connection to EVI
`disconnect`	`function`	Ends the voice session
`messages`	`array`	Array of conversation messages
`isMuted`	`boolean`	Current microphone mute state
`mute`	`function`	Mutes the microphone
`unmute`	`function`	Unmutes the microphone
`micFft`	`array`	Audio frequency data for visualization

Message structure

interface Message {
  type: "user_message" | "assistant_message";
  message: {
    role: string;
    content: string;
  };
  receivedAt: Date;
  models: {
    prosody?: {
      scores: Record<string, number>;
    };
  };
}

Best practices

Handle connection errors gracefully

Always wrap connection attempts in try-catch blocks and provide user feedback.

Request microphone permissions

Inform users about microphone access requirements before attempting to connect.

Secure your credentials

Never expose secret keys in client-side code. Use environment variables and keep them server-side.

Process emotions before disconnecting

Extract and save emotional data before calling disconnect() to avoid data loss.

Troubleshooting

Connection fails immediately

Verify that:

Your API key and config ID are correct
The user has granted microphone permissions
Your network allows WebSocket connections

No emotion scores in messages

Ensure that:

Your Hume EVI configuration has prosody models enabled
Messages are of type user_message (emotion detection works on user speech)
The models.prosody.scores object exists before accessing it

OAuth token errors

Check that:

Both VITE_HUME_API_KEY and HUME_SECRET_KEY are set in backend environment
Credentials are correctly formatted with no extra whitespace
Your API keys have not expired or been revoked

Get Started

Core Features

Architecture

Integrations

Hume AI integration

Overview

Prerequisites

Installation

Frontend implementation

Setting up the voice provider

Connecting to the voice interface

Accessing conversation messages

Managing the voice connection

Backend implementation

OAuth2 authentication

Batch audio analysis

Extracting emotion data

API reference

useVoice hook

Message structure

Best practices

Troubleshooting

Connection fails immediately

No emotion scores in messages

OAuth token errors

Build docs developers (and LLMs) love

Get Started

Core Features

Architecture

Integrations

​Overview

​Prerequisites

​Installation

​Frontend implementation

​Setting up the voice provider

​Connecting to the voice interface

​Accessing conversation messages

​Managing the voice connection

​Backend implementation

​OAuth2 authentication

​Batch audio analysis

​Extracting emotion data

​API reference

​useVoice hook

​Message structure

​Best practices

​Troubleshooting

​Connection fails immediately

​No emotion scores in messages

​OAuth token errors

​Related resources

Build docs developers (and LLMs) love

Overview

Prerequisites

Installation

Frontend implementation

Setting up the voice provider

Connecting to the voice interface

Accessing conversation messages

Managing the voice connection

Backend implementation

OAuth2 authentication

Batch audio analysis

Extracting emotion data

API reference

useVoice hook

Message structure

Best practices

Troubleshooting

Connection fails immediately

No emotion scores in messages

OAuth token errors

Related resources