SvaraAI leverages Hume AI’s Empathic Voice Interface (EVI) to enable real-time voice conversations with emotional intelligence. This integration provides both voice communication and emotion analysis capabilities.
Overview
The Hume AI integration consists of two main components:
Voice interface : Real-time voice conversations using Hume’s EVI
Emotion analysis : Prosody and language models that detect emotional states
Prerequisites
Before you begin, ensure you have:
A Hume AI account with API access
API key and secret key from your Hume AI dashboard
A configured EVI configuration ID
Installation
Install required packages
The frontend uses the official Hume AI React SDK: npm install @humeai/voice-react hume
Configure environment variables
Add your Hume AI credentials to your environment: Frontend (.env)
Backend (.env)
VITE_HUME_API_KEY = your_api_key_here
VITE_HUME_CONFIG_ID = your_config_id_here
Keep your HUME_SECRET_KEY secure and never expose it in client-side code. It should only be used in backend services.
Frontend implementation
Setting up the voice provider
The VoiceProvider component wraps your chat interface to enable voice functionality:
Frontend/src/components/chat.tsx
import { VoiceProvider } from "@humeai/voice-react" ;
import Messages from "./message" ;
import Controls from "./controls" ;
import StartCall from "./startCall" ;
export default function ChatInterface () {
const apiKey = import . meta . env . VITE_HUME_API_KEY || "" ;
const configId = import . meta . env . VITE_HUME_CONFIG_ID || "" ;
if ( ! apiKey || ! configId ) {
return (
< div className = "flex items-center justify-center h-screen" >
< p className = "text-red-500" > Missing API credentials. Check your .env file. </ p >
</ div >
);
}
return (
< div className = "relative grow flex flex-col mx-auto w-full overflow-hidden h-screen" >
< VoiceProvider >
< Messages />
< Controls />
< StartCall apiKey = { apiKey } configId = { configId } />
</ VoiceProvider >
</ div >
);
}
Connecting to the voice interface
Use the useVoice hook to connect to Hume’s EVI:
Frontend/src/components/startCall.tsx
import { useVoice , type ConnectOptions } from "@humeai/voice-react" ;
import { useState } from "react" ;
interface StartCallProps {
apiKey : string ;
configId : string ;
}
export default function StartCall ({ apiKey , configId } : StartCallProps ) {
const { status , connect } = useVoice ();
const [ isConnecting , setIsConnecting ] = useState ( false );
const handleConnect = async () => {
if ( status . value === "connected" || status . value === "connecting" || isConnecting ) {
return ;
}
setIsConnecting ( true );
const connectOptions : ConnectOptions = {
auth: { type: "apiKey" , value: apiKey },
configId: configId ,
};
try {
await connect ( connectOptions );
} catch {
alert ( "Unable to connect. Please check microphone permissions and try again." );
} finally {
setIsConnecting ( false );
}
};
return (
< button onClick = { handleConnect } disabled = { isConnecting || status . value === "connecting" } >
Start call
</ button >
);
}
Accessing conversation messages
The useVoice hook provides access to messages with emotional data:
Frontend/src/components/message.tsx
import { useVoice } from "@humeai/voice-react" ;
import Expressions from "./expressions" ;
export default function Messages () {
const { messages } = useVoice ();
return (
< div className = "flex flex-col gap-4" >
{ messages . map (( msg , index ) => {
if ( msg . type === "user_message" || msg . type === "assistant_message" ) {
const isUser = msg . type === "user_message" ;
return (
< div key = { index } >
< div >
< span > { msg . message . role } </ span >
< time > { msg . receivedAt . toLocaleTimeString () } </ time >
</ div >
< div > { msg . message . content } </ div >
{ /* Display emotion scores */ }
{ msg . models . prosody ?. scores && (
< Expressions values = { { ... msg . models . prosody . scores } } />
) }
</ div >
);
}
return null ;
}) }
</ div >
);
}
Each message includes a models property containing prosody scores that represent the emotional state detected in the user’s voice.
Managing the voice connection
Control microphone state and disconnect from the session:
Frontend/src/components/controls.tsx
import { useVoice } from "@humeai/voice-react" ;
export default function Controls () {
const { disconnect , isMuted , unmute , mute , micFft } = useVoice ();
const handleEndCall = async () => {
// Process conversation data before disconnecting
disconnect ?.();
};
return (
< div >
< button onClick = { () => ( isMuted ? unmute ?.() : mute ?.()) } >
{ isMuted ? "Unmute" : "Mute" }
</ button >
< button onClick = { handleEndCall } >
End call
</ button >
</ div >
);
}
Backend implementation
OAuth2 authentication
The backend handles secure token generation using OAuth2 client credentials flow:
Backend/utils/humeClient.ts
export const getHumeAccessToken = async () : Promise < string > => {
const apiKey = process . env . VITE_HUME_API_KEY ;
const secretKey = process . env . HUME_SECRET_KEY ;
if ( ! apiKey || ! secretKey ) {
throw new Error (
'Missing required environment variables (VITE_HUME_API_KEY or HUME_SECRET_KEY)'
);
}
try {
const authString = ` ${ apiKey } : ${ secretKey } ` ;
const encoded = Buffer . from ( authString ). toString ( 'base64' );
const res = await fetch ( 'https://api.hume.ai/oauth2-cc/token' , {
method: 'POST' ,
headers: {
'Content-Type' : 'application/x-www-form-urlencoded' ,
Authorization: `Basic ${ encoded } ` ,
},
body: new URLSearchParams ({ grant_type: 'client_credentials' }). toString (),
cache: 'no-cache' ,
});
if ( ! res . ok ) {
throw new Error ( `HTTP ${ res . status } : ${ res . statusText } ` );
}
const raw = await res . json ();
if ( ! raw . access_token ) {
throw new Error ( 'Unable to get access token: ' + JSON . stringify ( raw ));
}
return raw . access_token ;
} catch ( err ) {
throw err ;
}
};
Batch audio analysis
Analyze pre-recorded audio files for emotional content:
import express from 'express' ;
import { getHumeAccessToken } from '../utils/humeClient' ;
const router = express . Router ();
router . post ( '/' , async ( req , res ) : Promise < any > => {
const { audioUrl } = req . body ;
if ( ! audioUrl ) {
return res . status ( 400 ). json ({ error: 'audioUrl is required' });
}
try {
new URL ( audioUrl );
} catch {
return res . status ( 400 ). json ({ error: 'Invalid URL format' });
}
try {
const token = await getHumeAccessToken ();
const response = await fetch ( 'https://api.hume.ai/v0/batch/analyze-url' , {
method: 'POST' ,
headers: {
'Authorization' : `Bearer ${ token } ` ,
'Content-Type' : 'application/json' ,
},
body: JSON . stringify ({
url: audioUrl ,
models: {
language: {},
prosody: {}
},
}),
});
if ( ! response . ok ) {
throw new Error ( `Hume API request failed: ${ response . status } ` );
}
const result = await response . json ();
res . json ( result );
} catch ( err ) {
console . log ( err );
res . status ( 500 ). json ({ error: "analysis failed" });
}
});
export default router ;
Collect and aggregate emotional data from conversation messages:
const validMessages = messages . filter (
( msg ) => msg . type === "user_message" || msg . type === "assistant_message"
);
let emotions : Record < string , number > = {};
const userMessages = validMessages . filter (( msg ) => msg . type === "user_message" );
userMessages . forEach (( msg ) => {
if ( "models" in msg && msg . models ?. prosody ?. scores ) {
const scores = msg . models . prosody . scores ;
Object . entries ( scores ). forEach (([ emotion , score ]) => {
emotions [ emotion ] = ( emotions [ emotion ] || 0 ) + ( score as number );
});
}
});
// Calculate average scores
if ( userMessages . length > 0 ) {
Object . keys ( emotions ). forEach (( key ) => {
emotions [ key ] = emotions [ key ] / userMessages . length ;
});
}
API reference
useVoice hook
The primary hook for interacting with Hume’s voice interface:
Property Type Description statusobjectConnection status with value property connectfunctionInitiates connection to EVI disconnectfunctionEnds the voice session messagesarrayArray of conversation messages isMutedbooleanCurrent microphone mute state mutefunctionMutes the microphone unmutefunctionUnmutes the microphone micFftarrayAudio frequency data for visualization
Message structure
interface Message {
type : "user_message" | "assistant_message" ;
message : {
role : string ;
content : string ;
};
receivedAt : Date ;
models : {
prosody ?: {
scores : Record < string , number >;
};
};
}
Best practices
Handle connection errors gracefully
Always wrap connection attempts in try-catch blocks and provide user feedback.
Request microphone permissions
Inform users about microphone access requirements before attempting to connect.
Secure your credentials
Never expose secret keys in client-side code. Use environment variables and keep them server-side.
Process emotions before disconnecting
Extract and save emotional data before calling disconnect() to avoid data loss.
Troubleshooting
Verify that:
Your API key and config ID are correct
The user has granted microphone permissions
Your network allows WebSocket connections
No emotion scores in messages
Ensure that:
Your Hume EVI configuration has prosody models enabled
Messages are of type user_message (emotion detection works on user speech)
The models.prosody.scores object exists before accessing it
OAuth token errors
Check that:
Both VITE_HUME_API_KEY and HUME_SECRET_KEY are set in backend environment
Credentials are correctly formatted with no extra whitespace
Your API keys have not expired or been revoked