Voice emotion detection

SvaraAI detects and analyzes emotions from voice conversations in real-time, providing insights into emotional states during therapeutic sessions. The system uses Hume AI’s advanced prosody and language models to capture subtle emotional cues from voice patterns.

How emotion detection works

The emotion detection system processes audio through multiple models to capture both linguistic content and vocal patterns:

Audio capture

The system records voice input during conversations through the Hume AI Voice SDK

Model analysis

Audio is analyzed using two complementary models:

Prosody model: Analyzes vocal characteristics like pitch, tone, and rhythm
Language model: Processes the semantic content and context

Emotion scoring

Each model generates confidence scores for detected emotions, ranging from 0 to 1

Real-time display

Top emotions are displayed with visual indicators during the conversation

Backend implementation

The emotion detection API processes audio URLs and returns emotion scores:

import express from 'express';
import { getHumeAccessToken } from '../utils/humeClient';

const router = express.Router();

router.post('/', async (req, res): Promise<any> => {
  const { audioUrl } = req.body;
  
  if (!audioUrl) {
    return res.status(400).json({
      error: 'audioUrl is required'
    });
  }

  try {
    const token = await getHumeAccessToken();
    const response = await fetch('https://api.hume.ai/v0/batch/analyze-url', {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${token}`,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({
        url: audioUrl,
        models: {
          language: {},
          prosody: {}
        },
      }),
    });
    
    const result = await response.json();
    res.json(result);
  } catch (err) {
    res.status(500).json({
      error: "analysis failed"
    });
  }
});

The system uses OAuth2 client credentials flow to authenticate with Hume AI. You need both VITE_HUME_API_KEY and HUME_SECRET_KEY environment variables configured.

Frontend visualization

Emotions are displayed in real-time as messages are exchanged:

Frontend/src/components/expressions.tsx

import { motion } from "framer-motion";
import * as R from "remeda";

export default function Expressions({ 
  values 
}: { 
  values: Record<string, number> 
}) {
  // Extract top 3 emotions by score
  const top3 = R.pipe(
    values,
    R.entries(),
    R.sortBy(R.pathOr([1], 1)),
    R.reverse(),
    R.take(3)
  );

  return (
    <div className="grid grid-cols-3 gap-3">
      {top3.map(([key, value]: [string, number]) => (
        <div key={key}>
          <div className="flex justify-between">
            <div className="font-medium">
              {expressionLabels[key] || key}
            </div>
            <div className="opacity-50">
              {value.toFixed(2)}
            </div>
          </div>
          <motion.div
            className="h-1 bg-gradient rounded-full"
            initial={{ width: 0 }}
            animate={{ 
              width: `${Math.min(1, Math.max(0, value)) * 100}%` 
            }}
          />
        </div>
      ))}
    </div>
  );
}

Message integration

Each message displays emotion scores when available:

Frontend/src/components/message.tsx

{msg.models.prosody?.scores && (
  <div className="border-t bg-gradient">
    <Expressions values={{ ...msg.models.prosody.scores }} />
  </div>
)}

The Expressions component uses Framer Motion for smooth animations when emotion scores update, creating a visually engaging experience.

Detected emotions

SvaraAI tracks a wide range of emotional states, including:

Primary emotions

Happy
Sad
Angry
Surprised

Complex states

Anxious
Calm
Confident
Confused

Emotion scoring

Emotion scores represent the confidence level for each detected emotion:

Score Range	Interpretation
0.0 - 0.3	Low confidence - emotion likely not present
0.3 - 0.6	Moderate confidence - emotion may be present
0.6 - 0.8	High confidence - emotion is likely present
0.8 - 1.0	Very high confidence - emotion is strongly present

The top 3 emotions by score are displayed for each message to focus on the most prominent emotional states.

Use cases

Emotion detection enables several therapeutic applications:

Emotional awareness

Help users recognize their emotional states during conversations, promoting self-awareness and emotional intelligence.

Progress tracking

Monitor emotional patterns over time to identify trends and measure therapeutic progress.

Crisis detection

Identify heightened emotional states that may indicate distress or crisis situations requiring immediate attention.

Therapeutic feedback

Provide therapists with objective data about emotional states to inform treatment decisions.

API reference

Analyze audio emotion

audioUrl

string

required

URL to the audio file to analyze. Must be a valid, accessible URL.

curl -X POST https://your-api.com/api/hume \
  -H "Content-Type: application/json" \
  -d '{
    "audioUrl": "https://example.com/audio.wav"
  }'

{
  "models": {
    "prosody": {
      "scores": {
        "happy": 0.82,
        "calm": 0.65,
        "confident": 0.58
      }
    },
    "language": {
      "scores": {
        "happy": 0.75,
        "excited": 0.62
      }
    }
  }
}

Best practices

Audio quality

Ensure clear audio with minimal background noise for accurate emotion detection

Context awareness

Consider cultural and individual differences in emotional expression

Privacy protection

Handle emotion data securely and obtain user consent for analysis

Interpretation

Use emotion scores as guidance, not absolute truth - combine with clinical judgment

Limitations

Emotion detection is probabilistic and may not always accurately reflect a person’s true emotional state. Use emotion data as one input among many in therapeutic contexts.

Cultural variations: Emotional expression varies across cultures
Individual differences: People express emotions differently
Context dependency: Same vocal patterns may indicate different emotions in different contexts
Model limitations: AI models have inherent biases and limitations

Get Started

Core Features

Architecture

Integrations

Voice emotion detection

How emotion detection works

Backend implementation

Frontend visualization

Message integration

Detected emotions

Primary emotions

Complex states

Emotion scoring

Use cases

API reference

Analyze audio emotion

Best practices

Audio quality

Context awareness

Privacy protection

Interpretation

Limitations

Next steps

Therapeutic feedback

Conversation insights

Build docs developers (and LLMs) love

Get Started

Core Features

Architecture

Integrations

​How emotion detection works

​Backend implementation

​Frontend visualization

​Message integration

​Detected emotions

Primary emotions

Complex states

​Emotion scoring

​Use cases

​API reference

​Analyze audio emotion

​Best practices

Audio quality

Context awareness

Privacy protection

Interpretation

​Limitations

​Next steps

Therapeutic feedback

Conversation insights

Build docs developers (and LLMs) love

How emotion detection works

Backend implementation

Frontend visualization

Message integration

Detected emotions

Emotion scoring

Use cases

API reference

Analyze audio emotion

Best practices

Limitations

Next steps