Skip to main content
SvaraAI is built as a modern web application with a clear separation between frontend and backend services. The architecture is designed to provide real-time voice conversations with emotional intelligence through seamless integration with Hume AI and Google Gemini.

System components

Frontend

React-based SPA with real-time voice interface using Hume AI’s Voice SDK

Backend

Express.js API server handling AI service orchestration and data persistence

Architecture diagram

Data flow

SvaraAI processes voice conversations through multiple stages:
1

Voice capture

User speaks into their microphone, and the frontend captures audio using the Hume AI Voice SDK
2

Real-time analysis

Hume AI processes the audio stream and returns:
  • Transcribed text
  • Emotion scores (prosody analysis)
  • Real-time conversation state
3

Conversation display

Messages appear in the chat interface with emotional indicators showing the top detected emotions
4

Session analysis

When the conversation ends, the backend:
  • Aggregates emotion data across all user messages
  • Sends the transcript and emotions to Google Gemini
  • Generates personalized insights about emotional patterns
5

Insights presentation

The insights page displays:
  • AI-generated emotional analysis
  • Top 5 detected emotions with visual charts
  • Full conversation transcript

Key features

SvaraAI uses Hume AI’s Voice SDK to establish a WebSocket connection directly from the browser to Hume’s servers. This enables:
  • Low-latency voice-to-text transcription
  • Real-time emotion detection from vocal prosody
  • Bidirectional conversation with AI assistant
  • Live microphone FFT visualization
Every message in the conversation is analyzed for emotional content:
  • Prosody analysis: Hume AI detects emotions from voice characteristics (tone, pitch, pace)
  • Top-3 display: Each message shows the three strongest emotions detected
  • Aggregated scoring: User emotions are averaged across the entire conversation
  • Visual feedback: Emotion scores are displayed as percentage bars in the UI
After each conversation, Google Gemini 2.0 Flash generates personalized insights:
  • Receives the full conversation transcript
  • Analyzes the top 3 averaged emotions
  • Generates a concise summary (max 100 tokens)
  • Uses temperature 0.3 for consistent, focused analysis
Conversation data is managed efficiently:
  • Session storage: Current insights stored in browser sessionStorage
  • File persistence: Optional saving to entries.json on the backend
  • Deduplication: Duplicate messages are filtered to prevent redundant storage
  • Rotation: Maximum 20 entries stored to manage file size

Technology stack

Frontend

  • Framework: React 19 with TypeScript
  • Build tool: Vite 6
  • Routing: React Router v7
  • Styling: TailwindCSS 4 with custom animations
  • Animation: Framer Motion for smooth transitions
  • Voice SDK: @humeai/voice-react 0.2.11

Backend

  • Runtime: Node.js with TypeScript
  • Framework: Express.js 5
  • Development: ts-node-dev with hot reload
  • HTTP client: Native fetch API

AI services

  • Hume AI: Voice interface and emotion detection
  • Google Gemini: 2.0 Flash model for insight generation
The architecture is designed for easy local development. Both frontend and backend run independently, with the frontend proxying API requests to localhost:5000.

Deployment considerations

Before deploying to production, ensure you:
  • Set up proper environment variables for all API keys
  • Configure CORS to allow only your frontend domain
  • Implement rate limiting on API endpoints
  • Use a production-grade database instead of file storage
  • Enable HTTPS for secure voice transmission

Next steps

Frontend architecture

Explore the React components and voice interface implementation

Backend architecture

Learn about the API routes, controllers, and AI service integration

Build docs developers (and LLMs) love