Skip to main content
SvaraAI leverages Hume AI’s Empathic Voice Interface (EVI) to enable real-time voice conversations with emotional intelligence. This integration provides both voice communication and emotion analysis capabilities.

Overview

The Hume AI integration consists of two main components:
  • Voice interface: Real-time voice conversations using Hume’s EVI
  • Emotion analysis: Prosody and language models that detect emotional states

Prerequisites

Before you begin, ensure you have:
  • A Hume AI account with API access
  • API key and secret key from your Hume AI dashboard
  • A configured EVI configuration ID

Installation

1

Install required packages

The frontend uses the official Hume AI React SDK:
npm install @humeai/voice-react hume
2

Configure environment variables

Add your Hume AI credentials to your environment:
VITE_HUME_API_KEY=your_api_key_here
VITE_HUME_CONFIG_ID=your_config_id_here
Keep your HUME_SECRET_KEY secure and never expose it in client-side code. It should only be used in backend services.

Frontend implementation

Setting up the voice provider

The VoiceProvider component wraps your chat interface to enable voice functionality:
Frontend/src/components/chat.tsx
import { VoiceProvider } from "@humeai/voice-react";
import Messages from "./message";
import Controls from "./controls";
import StartCall from "./startCall";

export default function ChatInterface() {
  const apiKey = import.meta.env.VITE_HUME_API_KEY || "";
  const configId = import.meta.env.VITE_HUME_CONFIG_ID || "";

  if (!apiKey || !configId) {
    return (
      <div className="flex items-center justify-center h-screen">
        <p className="text-red-500">Missing API credentials. Check your .env file.</p>
      </div>
    );
  }

  return (
    <div className="relative grow flex flex-col mx-auto w-full overflow-hidden h-screen">
      <VoiceProvider>
        <Messages />
        <Controls />
        <StartCall apiKey={apiKey} configId={configId} />
      </VoiceProvider>
    </div>
  );
}

Connecting to the voice interface

Use the useVoice hook to connect to Hume’s EVI:
Frontend/src/components/startCall.tsx
import { useVoice, type ConnectOptions } from "@humeai/voice-react";
import { useState } from "react";

interface StartCallProps {
  apiKey: string;
  configId: string;
}

export default function StartCall({ apiKey, configId }: StartCallProps) {
  const { status, connect } = useVoice();
  const [isConnecting, setIsConnecting] = useState(false);

  const handleConnect = async () => {
    if (status.value === "connected" || status.value === "connecting" || isConnecting) {
      return;
    }

    setIsConnecting(true);

    const connectOptions: ConnectOptions = {
      auth: { type: "apiKey", value: apiKey },
      configId: configId,
    };

    try {
      await connect(connectOptions);
    } catch {
      alert("Unable to connect. Please check microphone permissions and try again.");
    } finally {
      setIsConnecting(false);
    }
  };

  return (
    <button onClick={handleConnect} disabled={isConnecting || status.value === "connecting"}>
      Start call
    </button>
  );
}

Accessing conversation messages

The useVoice hook provides access to messages with emotional data:
Frontend/src/components/message.tsx
import { useVoice } from "@humeai/voice-react";
import Expressions from "./expressions";

export default function Messages() {
  const { messages } = useVoice();

  return (
    <div className="flex flex-col gap-4">
      {messages.map((msg, index) => {
        if (msg.type === "user_message" || msg.type === "assistant_message") {
          const isUser = msg.type === "user_message";

          return (
            <div key={index}>
              <div>
                <span>{msg.message.role}</span>
                <time>{msg.receivedAt.toLocaleTimeString()}</time>
              </div>
              
              <div>{msg.message.content}</div>

              {/* Display emotion scores */}
              {msg.models.prosody?.scores && (
                <Expressions values={{ ...msg.models.prosody.scores }} />
              )}
            </div>
          );
        }
        return null;
      })}
    </div>
  );
}
Each message includes a models property containing prosody scores that represent the emotional state detected in the user’s voice.

Managing the voice connection

Control microphone state and disconnect from the session:
Frontend/src/components/controls.tsx
import { useVoice } from "@humeai/voice-react";

export default function Controls() {
  const { disconnect, isMuted, unmute, mute, micFft } = useVoice();

  const handleEndCall = async () => {
    // Process conversation data before disconnecting
    disconnect?.();
  };

  return (
    <div>
      <button onClick={() => (isMuted ? unmute?.() : mute?.())}>
        {isMuted ? "Unmute" : "Mute"}
      </button>
      
      <button onClick={handleEndCall}>
        End call
      </button>
    </div>
  );
}

Backend implementation

OAuth2 authentication

The backend handles secure token generation using OAuth2 client credentials flow:
Backend/utils/humeClient.ts
export const getHumeAccessToken = async (): Promise<string> => {
  const apiKey = process.env.VITE_HUME_API_KEY;
  const secretKey = process.env.HUME_SECRET_KEY;

  if (!apiKey || !secretKey) {
    throw new Error(
      'Missing required environment variables (VITE_HUME_API_KEY or HUME_SECRET_KEY)'
    );
  }

  try {
    const authString = `${apiKey}:${secretKey}`;
    const encoded = Buffer.from(authString).toString('base64');
    
    const res = await fetch('https://api.hume.ai/oauth2-cc/token', {
      method: 'POST',
      headers: {
        'Content-Type': 'application/x-www-form-urlencoded',
        Authorization: `Basic ${encoded}`,
      },
      body: new URLSearchParams({ grant_type: 'client_credentials' }).toString(),
      cache: 'no-cache',
    });
    
    if (!res.ok) {
      throw new Error(`HTTP ${res.status}: ${res.statusText}`);
    }
    
    const raw = await res.json();
    if (!raw.access_token) {
      throw new Error('Unable to get access token: ' + JSON.stringify(raw));
    }
    
    return raw.access_token;
  } catch (err) {
    throw err;
  }
};

Batch audio analysis

Analyze pre-recorded audio files for emotional content:
Backend/routes/hume.ts
import express from 'express';
import { getHumeAccessToken } from '../utils/humeClient';

const router = express.Router();

router.post('/', async (req, res): Promise<any> => {
  const { audioUrl } = req.body;
  
  if (!audioUrl) {
    return res.status(400).json({ error: 'audioUrl is required' });
  }

  try {
    new URL(audioUrl);
  } catch {
    return res.status(400).json({ error: 'Invalid URL format' });
  }

  try {
    const token = await getHumeAccessToken();
    
    const response = await fetch('https://api.hume.ai/v0/batch/analyze-url', {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${token}`,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({
        url: audioUrl,
        models: {
          language: {},
          prosody: {}
        },
      }),
    });
    
    if (!response.ok) {
      throw new Error(`Hume API request failed: ${response.status}`);
    }
    
    const result = await response.json();
    res.json(result);
  } catch (err) {
    console.log(err);
    res.status(500).json({ error: "analysis failed" });
  }
});

export default router;

Extracting emotion data

Collect and aggregate emotional data from conversation messages:
const validMessages = messages.filter(
  (msg) => msg.type === "user_message" || msg.type === "assistant_message"
);

let emotions: Record<string, number> = {};

const userMessages = validMessages.filter((msg) => msg.type === "user_message");

userMessages.forEach((msg) => {
  if ("models" in msg && msg.models?.prosody?.scores) {
    const scores = msg.models.prosody.scores;
    Object.entries(scores).forEach(([emotion, score]) => {
      emotions[emotion] = (emotions[emotion] || 0) + (score as number);
    });
  }
});

// Calculate average scores
if (userMessages.length > 0) {
  Object.keys(emotions).forEach((key) => {
    emotions[key] = emotions[key] / userMessages.length;
  });
}

API reference

useVoice hook

The primary hook for interacting with Hume’s voice interface:
PropertyTypeDescription
statusobjectConnection status with value property
connectfunctionInitiates connection to EVI
disconnectfunctionEnds the voice session
messagesarrayArray of conversation messages
isMutedbooleanCurrent microphone mute state
mutefunctionMutes the microphone
unmutefunctionUnmutes the microphone
micFftarrayAudio frequency data for visualization

Message structure

interface Message {
  type: "user_message" | "assistant_message";
  message: {
    role: string;
    content: string;
  };
  receivedAt: Date;
  models: {
    prosody?: {
      scores: Record<string, number>;
    };
  };
}

Best practices

1

Handle connection errors gracefully

Always wrap connection attempts in try-catch blocks and provide user feedback.
2

Request microphone permissions

Inform users about microphone access requirements before attempting to connect.
3

Secure your credentials

Never expose secret keys in client-side code. Use environment variables and keep them server-side.
4

Process emotions before disconnecting

Extract and save emotional data before calling disconnect() to avoid data loss.

Troubleshooting

Connection fails immediately

Verify that:
  • Your API key and config ID are correct
  • The user has granted microphone permissions
  • Your network allows WebSocket connections

No emotion scores in messages

Ensure that:
  • Your Hume EVI configuration has prosody models enabled
  • Messages are of type user_message (emotion detection works on user speech)
  • The models.prosody.scores object exists before accessing it

OAuth token errors

Check that:
  • Both VITE_HUME_API_KEY and HUME_SECRET_KEY are set in backend environment
  • Credentials are correctly formatted with no extra whitespace
  • Your API keys have not expired or been revoked

Build docs developers (and LLMs) love