Skip to main content
SvaraAI detects and analyzes emotions from voice conversations in real-time, providing insights into emotional states during therapeutic sessions. The system uses Hume AI’s advanced prosody and language models to capture subtle emotional cues from voice patterns.

How emotion detection works

The emotion detection system processes audio through multiple models to capture both linguistic content and vocal patterns:
1

Audio capture

The system records voice input during conversations through the Hume AI Voice SDK
2

Model analysis

Audio is analyzed using two complementary models:
  • Prosody model: Analyzes vocal characteristics like pitch, tone, and rhythm
  • Language model: Processes the semantic content and context
3

Emotion scoring

Each model generates confidence scores for detected emotions, ranging from 0 to 1
4

Real-time display

Top emotions are displayed with visual indicators during the conversation

Backend implementation

The emotion detection API processes audio URLs and returns emotion scores:
import express from 'express';
import { getHumeAccessToken } from '../utils/humeClient';

const router = express.Router();

router.post('/', async (req, res): Promise<any> => {
  const { audioUrl } = req.body;
  
  if (!audioUrl) {
    return res.status(400).json({
      error: 'audioUrl is required'
    });
  }

  try {
    const token = await getHumeAccessToken();
    const response = await fetch('https://api.hume.ai/v0/batch/analyze-url', {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${token}`,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({
        url: audioUrl,
        models: {
          language: {},
          prosody: {}
        },
      }),
    });
    
    const result = await response.json();
    res.json(result);
  } catch (err) {
    res.status(500).json({
      error: "analysis failed"
    });
  }
});
The system uses OAuth2 client credentials flow to authenticate with Hume AI. You need both VITE_HUME_API_KEY and HUME_SECRET_KEY environment variables configured.

Frontend visualization

Emotions are displayed in real-time as messages are exchanged:
Frontend/src/components/expressions.tsx
import { motion } from "framer-motion";
import * as R from "remeda";

export default function Expressions({ 
  values 
}: { 
  values: Record<string, number> 
}) {
  // Extract top 3 emotions by score
  const top3 = R.pipe(
    values,
    R.entries(),
    R.sortBy(R.pathOr([1], 1)),
    R.reverse(),
    R.take(3)
  );

  return (
    <div className="grid grid-cols-3 gap-3">
      {top3.map(([key, value]: [string, number]) => (
        <div key={key}>
          <div className="flex justify-between">
            <div className="font-medium">
              {expressionLabels[key] || key}
            </div>
            <div className="opacity-50">
              {value.toFixed(2)}
            </div>
          </div>
          <motion.div
            className="h-1 bg-gradient rounded-full"
            initial={{ width: 0 }}
            animate={{ 
              width: `${Math.min(1, Math.max(0, value)) * 100}%` 
            }}
          />
        </div>
      ))}
    </div>
  );
}

Message integration

Each message displays emotion scores when available:
Frontend/src/components/message.tsx
{msg.models.prosody?.scores && (
  <div className="border-t bg-gradient">
    <Expressions values={{ ...msg.models.prosody.scores }} />
  </div>
)}
The Expressions component uses Framer Motion for smooth animations when emotion scores update, creating a visually engaging experience.

Detected emotions

SvaraAI tracks a wide range of emotional states, including:

Primary emotions

  • Happy
  • Sad
  • Angry
  • Surprised

Complex states

  • Anxious
  • Calm
  • Confident
  • Confused

Emotion scoring

Emotion scores represent the confidence level for each detected emotion:
Score RangeInterpretation
0.0 - 0.3Low confidence - emotion likely not present
0.3 - 0.6Moderate confidence - emotion may be present
0.6 - 0.8High confidence - emotion is likely present
0.8 - 1.0Very high confidence - emotion is strongly present
The top 3 emotions by score are displayed for each message to focus on the most prominent emotional states.

Use cases

Emotion detection enables several therapeutic applications:
Help users recognize their emotional states during conversations, promoting self-awareness and emotional intelligence.
Monitor emotional patterns over time to identify trends and measure therapeutic progress.
Identify heightened emotional states that may indicate distress or crisis situations requiring immediate attention.
Provide therapists with objective data about emotional states to inform treatment decisions.

API reference

Analyze audio emotion

audioUrl
string
required
URL to the audio file to analyze. Must be a valid, accessible URL.
curl -X POST https://your-api.com/api/hume \
  -H "Content-Type: application/json" \
  -d '{
    "audioUrl": "https://example.com/audio.wav"
  }'
{
  "models": {
    "prosody": {
      "scores": {
        "happy": 0.82,
        "calm": 0.65,
        "confident": 0.58
      }
    },
    "language": {
      "scores": {
        "happy": 0.75,
        "excited": 0.62
      }
    }
  }
}

Best practices

Audio quality

Ensure clear audio with minimal background noise for accurate emotion detection

Context awareness

Consider cultural and individual differences in emotional expression

Privacy protection

Handle emotion data securely and obtain user consent for analysis

Interpretation

Use emotion scores as guidance, not absolute truth - combine with clinical judgment

Limitations

Emotion detection is probabilistic and may not always accurately reflect a person’s true emotional state. Use emotion data as one input among many in therapeutic contexts.
  • Cultural variations: Emotional expression varies across cultures
  • Individual differences: People express emotions differently
  • Context dependency: Same vocal patterns may indicate different emotions in different contexts
  • Model limitations: AI models have inherent biases and limitations

Next steps

Therapeutic feedback

Learn how emotion data powers AI-generated therapeutic insights

Conversation insights

Explore comprehensive analytics from conversation sessions

Build docs developers (and LLMs) love