SvaraAI uses Google’s Gemini AI to analyze conversation transcripts and emotional data, generating personalized insights and reflections based on the user’s voice interactions.
Overview
The Gemini AI integration processes:
- Conversation transcripts: Full text of user and assistant messages
- Emotional data: Aggregated emotion scores from Hume AI
- Context-aware prompts: Customizable system prompts for tailored responses
Gemini generates concise, actionable insights that help users understand their emotional state and conversation patterns.
Prerequisites
Before you begin, ensure you have:
- A Google Cloud account with Gemini API access
- Gemini API key from Google AI Studio or Cloud Console
- Configured custom prompt template
Installation
Configure environment variables
Add your Gemini credentials to the backend environment:GEMINI_API_KEY=your_api_key_here
GEMINI_PROMPT="Your custom prompt template with {{transcript}} and {{emoData}} placeholders"
The GEMINI_API_KEY should never be exposed in client-side code. All Gemini requests must go through your backend.
Backend implementation
API endpoint setup
Create an Express route to handle Gemini requests:
import express, { Request, Response } from 'express';
import dotenv from 'dotenv';
import path from 'path';
dotenv.config({ path: path.resolve(__dirname, '../../.env') });
const router = express.Router();
router.post('/', async (req: Request, res: Response): Promise<void> => {
try {
const { transcript, emoData } = req.body;
if (!transcript) {
console.error('No transcript provided');
res.status(400).json({ error: 'Transcript is required' });
return;
}
const emotionData = emoData || {};
const apiKey = process.env.GEMINI_API_KEY;
if (!apiKey) {
console.error('GEMINI_API_KEY is missing from environment variables');
res.status(500).json({ error: 'Unable to process your request at this time' });
return;
}
// Process and send to Gemini
// ... (implementation below)
} catch (error: any) {
console.error('[Gemini API Error] An error occurred:', error);
res.status(500).json({
error: 'Unable to process your request at this time'
});
}
});
export default router;
Process raw emotion scores into a human-readable format:
const formattedEmoData = Object.entries(emotionData)
.sort(([, a], [, b]) => (b as number) - (a as number))
.slice(0, 3) // Top 3 emotions
.map(([emotion, score]) => `${emotion}: ${((score as number) * 100).toFixed(1)}%`)
.join('\n');
This sorts emotions by intensity and selects the top 3 most prominent emotions, formatting them as percentages for better readability.
Dynamic prompt construction
Replace placeholders in your prompt template with actual data:
const rawPrompt = process.env.GEMINI_PROMPT;
if (!rawPrompt) {
console.error('GEMINI_PROMPT is missing from environment variables');
res.status(500).json({ error: 'Unable to process your request at this time' });
return;
}
const prompt = rawPrompt
.replace('{{transcript}}', transcript)
.replace('{{emoData}}', formattedEmoData);
Making the Gemini API request
Send the formatted prompt to Gemini 2.0 Flash:
const response = await fetch(
`https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent?key=${apiKey}`,
{
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify({
contents: [{
parts: [{
text: prompt
}]
}],
generationConfig: {
temperature: 0.3,
topK: 20,
topP: 0.8,
maxOutputTokens: 100,
}
})
}
);
const data = await response.json();
if (!response.ok) {
console.error('[Gemini API Error]', data);
res.status(response.status).json({
error: 'Unable to process your request at this time'
});
return;
}
if (data.candidates?.[0]?.content?.parts?.[0]?.text) {
res.json({
response: data.candidates[0].content.parts[0].text,
emotions: emotionData
});
} else {
console.error('Unexpected Gemini response structure:', data);
res.status(500).json({
error: 'Unable to process your request at this time'
});
}
Generation configuration
The generation parameters are optimized for concise, focused insights:
| Parameter | Value | Purpose |
|---|
temperature | 0.3 | Low randomness for consistent, focused responses |
topK | 20 | Limits vocabulary to top 20 probable tokens |
topP | 0.8 | Nucleus sampling for balanced creativity |
maxOutputTokens | 100 | Keeps responses brief and actionable |
These parameters are tuned for generating short insights. Adjust maxOutputTokens if you need longer responses.
Frontend implementation
Sending conversation data
Call the Gemini endpoint when the user ends their call:
Frontend/src/components/controls.tsx
const handleEndCall = async () => {
const validMessages = messages.filter(
(msg) => msg.type === "user_message" || msg.type === "assistant_message"
);
let transcript = "";
let emotions: Record<string, number> = {};
if (validMessages.length > 0) {
// Build transcript
transcript = validMessages
.map((msg) => {
const role = msg.type === "user_message" ? "User" : "Assistant";
const content = "message" in msg ? msg.message?.content || "" : "";
return `${role}: ${content}`;
})
.filter((line) => line.includes(": ") && line.split(": ")[1].trim())
.join("\n");
// Aggregate emotions from user messages
const userMessages = validMessages.filter((msg) => msg.type === "user_message");
userMessages.forEach((msg) => {
if ("models" in msg && msg.models?.prosody?.scores) {
const scores = msg.models.prosody.scores;
Object.entries(scores).forEach(([emotion, score]) => {
emotions[emotion] = (emotions[emotion] || 0) + (score as number);
});
}
});
// Calculate averages
if (userMessages.length > 0) {
Object.keys(emotions).forEach((key) => {
emotions[key] = emotions[key] / userMessages.length;
});
}
}
try {
const res = await fetch("http://localhost:5000/api/gemini", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
transcript: transcript || "No conversation",
emoData: emotions
}),
});
const data = await res.json();
console.log("Gemini response:", data);
// Save to sessionStorage for insights page
sessionStorage.setItem(
"svaraInsights",
JSON.stringify({
transcript: transcript || "No conversation recorded",
emotions,
analysis: data.response || "Analysis unavailable",
timestamp: Date.now(),
})
);
} catch (err) {
console.error("Error calling Gemini:", err);
// Save fallback data
sessionStorage.setItem(
"svaraInsights",
JSON.stringify({
transcript: transcript || "No conversation recorded",
emotions,
analysis: "Could not generate analysis. Please try again.",
timestamp: Date.now(),
})
);
}
disconnect?.();
navigate("/insights");
};
Displaying generated insights
Retrieve and display the Gemini-generated analysis:
Frontend/src/pages/insights.tsx
import { useEffect, useState } from "react";
interface InsightData {
transcript: string;
emotions: Record<string, number>;
analysis: string;
timestamp: number;
}
export default function InsightsPage() {
const [insights, setInsights] = useState<InsightData | null>(null);
useEffect(() => {
const stored = sessionStorage.getItem("svaraInsights");
if (stored) {
setInsights(JSON.parse(stored));
}
}, []);
if (!insights) {
return <div>No insights available</div>;
}
return (
<div>
<h1>Your conversation insights</h1>
<section>
<h2>AI analysis</h2>
<p>{insights.analysis}</p>
</section>
<section>
<h2>Top emotions</h2>
{Object.entries(insights.emotions)
.sort(([, a], [, b]) => b - a)
.slice(0, 3)
.map(([emotion, score]) => (
<div key={emotion}>
{emotion}: {(score * 100).toFixed(1)}%
</div>
))}
</section>
<section>
<h2>Transcript</h2>
<pre>{insights.transcript}</pre>
</section>
</div>
);
}
Crafting effective prompts
Your GEMINI_PROMPT environment variable should use placeholders that get replaced with actual data:
You are an empathetic AI analyzing a mental health conversation. Based on the following:
Conversation transcript:
{{transcript}}
Detected emotions:
{{emoData}}
Provide a brief, supportive insight (2-3 sentences) about the user's emotional state and any patterns you notice.
Use placeholders
Include {{transcript}} and {{emoData}} where you want the actual data inserted.
Set the tone
Define how Gemini should respond (empathetic, clinical, coaching, etc.).
Specify format
Request a specific length or structure for consistent outputs.
Add context
Mention the domain (mental health, coaching, etc.) for relevant insights.
Error handling
Implement comprehensive error handling for production use:
try {
const response = await fetch(geminiEndpoint, options);
const data = await response.json();
if (!response.ok) {
console.error('[Gemini API Error]', data);
// Return user-friendly error
return { error: 'Unable to process your request at this time' };
}
// Validate response structure
if (!data.candidates?.[0]?.content?.parts?.[0]?.text) {
console.error('Unexpected response structure:', data);
return { error: 'Unable to process your request at this time' };
}
return { response: data.candidates[0].content.parts[0].text };
} catch (error) {
console.error('Network or parsing error:', error);
return { error: 'Unable to process your request at this time' };
}
Rate limiting and caching
Implementing cache
For repeated analysis requests, consider caching:
const CACHE_DURATION = 5 * 60 * 1000; // 5 minutes
interface CacheData {
entries: Entry[];
lastUpdated: number;
lastModified: number;
}
let entriesCache: CacheData | null = null;
async function getEntriesWithCache(): Promise<Entry[]> {
try {
const stats = await fs.stat(ENTRIES_FILE_PATH);
const fileModified = stats.mtimeMs;
if (entriesCache &&
Date.now() - entriesCache.lastUpdated < CACHE_DURATION &&
entriesCache.lastModified === fileModified) {
return entriesCache.entries;
}
const entriesData = await fs.readFile(ENTRIES_FILE_PATH, 'utf-8');
const entries = JSON.parse(entriesData);
entriesCache = {
entries,
lastUpdated: Date.now(),
lastModified: fileModified
};
return entries;
} catch (error) {
console.error('Error reading entries file:', error);
if (entriesCache) {
console.warn('Using cached entries as fallback');
return entriesCache.entries;
}
throw new Error('Unable to read entries data');
}
}
API reference
Request body
interface GeminiRequest {
transcript: string; // Required: Full conversation text
emoData?: Record<string, number>; // Optional: Emotion scores
}
Response structure
interface GeminiResponse {
response: string; // Generated insight text
emotions: Record<string, number>; // Echoed emotion data
}
Error responses
interface GeminiError {
error: string; // User-friendly error message
}
Best practices
Validate input data
Always check that transcript exists before sending to Gemini. Empty transcripts waste API quota.
Keep prompts focused
Shorter, specific prompts yield better results than lengthy, vague ones.
Handle rate limits
Implement exponential backoff if you hit Gemini’s rate limits.
Log strategically
Log errors for debugging but never log sensitive user conversation data in production.
Set appropriate token limits
Match maxOutputTokens to your UI constraints to avoid truncation issues.
Troubleshooting
API key errors
If you see authentication errors:
- Verify
GEMINI_API_KEY is set correctly in your backend .env
- Check that your API key is active in Google AI Studio
- Ensure there are no extra spaces or quotes in the environment variable
Empty or unexpected responses
If Gemini returns no text:
- Check your prompt template includes both placeholders
- Verify
data.candidates[0].content.parts[0].text path exists
- Inspect the full response object for safety ratings or blocks
Response quality issues
If insights are generic or unhelpful:
- Lower
temperature for more focused responses
- Add more context to your prompt template
- Increase
maxOutputTokens if responses seem cut off
- Include example outputs in your prompt