System components
Frontend
React-based SPA with real-time voice interface using Hume AI’s Voice SDK
Backend
Express.js API server handling AI service orchestration and data persistence
Architecture diagram
Data flow
SvaraAI processes voice conversations through multiple stages:Voice capture
User speaks into their microphone, and the frontend captures audio using the Hume AI Voice SDK
Real-time analysis
Hume AI processes the audio stream and returns:
- Transcribed text
- Emotion scores (prosody analysis)
- Real-time conversation state
Conversation display
Messages appear in the chat interface with emotional indicators showing the top detected emotions
Session analysis
When the conversation ends, the backend:
- Aggregates emotion data across all user messages
- Sends the transcript and emotions to Google Gemini
- Generates personalized insights about emotional patterns
Key features
Real-time voice processing
Real-time voice processing
SvaraAI uses Hume AI’s Voice SDK to establish a WebSocket connection directly from the browser to Hume’s servers. This enables:
- Low-latency voice-to-text transcription
- Real-time emotion detection from vocal prosody
- Bidirectional conversation with AI assistant
- Live microphone FFT visualization
Emotion intelligence
Emotion intelligence
Every message in the conversation is analyzed for emotional content:
- Prosody analysis: Hume AI detects emotions from voice characteristics (tone, pitch, pace)
- Top-3 display: Each message shows the three strongest emotions detected
- Aggregated scoring: User emotions are averaged across the entire conversation
- Visual feedback: Emotion scores are displayed as percentage bars in the UI
AI-powered insights
AI-powered insights
After each conversation, Google Gemini 2.0 Flash generates personalized insights:
- Receives the full conversation transcript
- Analyzes the top 3 averaged emotions
- Generates a concise summary (max 100 tokens)
- Uses temperature 0.3 for consistent, focused analysis
Session management
Session management
Conversation data is managed efficiently:
- Session storage: Current insights stored in browser sessionStorage
- File persistence: Optional saving to
entries.jsonon the backend - Deduplication: Duplicate messages are filtered to prevent redundant storage
- Rotation: Maximum 20 entries stored to manage file size
Technology stack
Frontend
- Framework: React 19 with TypeScript
- Build tool: Vite 6
- Routing: React Router v7
- Styling: TailwindCSS 4 with custom animations
- Animation: Framer Motion for smooth transitions
- Voice SDK: @humeai/voice-react 0.2.11
Backend
- Runtime: Node.js with TypeScript
- Framework: Express.js 5
- Development: ts-node-dev with hot reload
- HTTP client: Native fetch API
AI services
- Hume AI: Voice interface and emotion detection
- Google Gemini: 2.0 Flash model for insight generation
The architecture is designed for easy local development. Both frontend and backend run independently, with the frontend proxying API requests to
localhost:5000.Deployment considerations
Next steps
Frontend architecture
Explore the React components and voice interface implementation
Backend architecture
Learn about the API routes, controllers, and AI service integration