Skip to main content
SvaraAI uses Google’s Gemini AI to analyze conversation transcripts and emotional data, generating personalized insights and reflections based on the user’s voice interactions.

Overview

The Gemini AI integration processes:
  • Conversation transcripts: Full text of user and assistant messages
  • Emotional data: Aggregated emotion scores from Hume AI
  • Context-aware prompts: Customizable system prompts for tailored responses
Gemini generates concise, actionable insights that help users understand their emotional state and conversation patterns.

Prerequisites

Before you begin, ensure you have:
  • A Google Cloud account with Gemini API access
  • Gemini API key from Google AI Studio or Cloud Console
  • Configured custom prompt template

Installation

1

Obtain API key

Get your Gemini API key from Google AI Studio.
2

Configure environment variables

Add your Gemini credentials to the backend environment:
Backend/.env
GEMINI_API_KEY=your_api_key_here
GEMINI_PROMPT="Your custom prompt template with {{transcript}} and {{emoData}} placeholders"
The GEMINI_API_KEY should never be exposed in client-side code. All Gemini requests must go through your backend.

Backend implementation

API endpoint setup

Create an Express route to handle Gemini requests:
Backend/routes/gemini.ts
import express, { Request, Response } from 'express';
import dotenv from 'dotenv';
import path from 'path';

dotenv.config({ path: path.resolve(__dirname, '../../.env') });

const router = express.Router();

router.post('/', async (req: Request, res: Response): Promise<void> => {
  try {
    const { transcript, emoData } = req.body;

    if (!transcript) {
      console.error('No transcript provided');
      res.status(400).json({ error: 'Transcript is required' });
      return;
    }
    
    const emotionData = emoData || {};

    const apiKey = process.env.GEMINI_API_KEY;
    if (!apiKey) {
      console.error('GEMINI_API_KEY is missing from environment variables');
      res.status(500).json({ error: 'Unable to process your request at this time' });
      return;
    }

    // Process and send to Gemini
    // ... (implementation below)
  } catch (error: any) {
    console.error('[Gemini API Error] An error occurred:', error);
    res.status(500).json({
      error: 'Unable to process your request at this time'
    });
  }
});

export default router;

Formatting emotion data

Process raw emotion scores into a human-readable format:
Backend/routes/gemini.ts
const formattedEmoData = Object.entries(emotionData)
  .sort(([, a], [, b]) => (b as number) - (a as number))
  .slice(0, 3)  // Top 3 emotions
  .map(([emotion, score]) => `${emotion}: ${((score as number) * 100).toFixed(1)}%`)
  .join('\n');
This sorts emotions by intensity and selects the top 3 most prominent emotions, formatting them as percentages for better readability.

Dynamic prompt construction

Replace placeholders in your prompt template with actual data:
Backend/routes/gemini.ts
const rawPrompt = process.env.GEMINI_PROMPT;
if (!rawPrompt) {
  console.error('GEMINI_PROMPT is missing from environment variables');
  res.status(500).json({ error: 'Unable to process your request at this time' });
  return;
}

const prompt = rawPrompt
  .replace('{{transcript}}', transcript)
  .replace('{{emoData}}', formattedEmoData);

Making the Gemini API request

Send the formatted prompt to Gemini 2.0 Flash:
Backend/routes/gemini.ts
const response = await fetch(
  `https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent?key=${apiKey}`,
  {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      contents: [{
        parts: [{
          text: prompt
        }]
      }],
      generationConfig: {
        temperature: 0.3,
        topK: 20,
        topP: 0.8,
        maxOutputTokens: 100,
      }
    })
  }
);

const data = await response.json();

if (!response.ok) {
  console.error('[Gemini API Error]', data);
  res.status(response.status).json({
    error: 'Unable to process your request at this time'
  });
  return;
}

if (data.candidates?.[0]?.content?.parts?.[0]?.text) {
  res.json({
    response: data.candidates[0].content.parts[0].text,
    emotions: emotionData
  });
} else {
  console.error('Unexpected Gemini response structure:', data);
  res.status(500).json({ 
    error: 'Unable to process your request at this time'
  });
}

Generation configuration

The generation parameters are optimized for concise, focused insights:
ParameterValuePurpose
temperature0.3Low randomness for consistent, focused responses
topK20Limits vocabulary to top 20 probable tokens
topP0.8Nucleus sampling for balanced creativity
maxOutputTokens100Keeps responses brief and actionable
These parameters are tuned for generating short insights. Adjust maxOutputTokens if you need longer responses.

Frontend implementation

Sending conversation data

Call the Gemini endpoint when the user ends their call:
Frontend/src/components/controls.tsx
const handleEndCall = async () => {
  const validMessages = messages.filter(
    (msg) => msg.type === "user_message" || msg.type === "assistant_message"
  );

  let transcript = "";
  let emotions: Record<string, number> = {};

  if (validMessages.length > 0) {
    // Build transcript
    transcript = validMessages
      .map((msg) => {
        const role = msg.type === "user_message" ? "User" : "Assistant";
        const content = "message" in msg ? msg.message?.content || "" : "";
        return `${role}: ${content}`;
      })
      .filter((line) => line.includes(": ") && line.split(": ")[1].trim())
      .join("\n");

    // Aggregate emotions from user messages
    const userMessages = validMessages.filter((msg) => msg.type === "user_message");
    userMessages.forEach((msg) => {
      if ("models" in msg && msg.models?.prosody?.scores) {
        const scores = msg.models.prosody.scores;
        Object.entries(scores).forEach(([emotion, score]) => {
          emotions[emotion] = (emotions[emotion] || 0) + (score as number);
        });
      }
    });

    // Calculate averages
    if (userMessages.length > 0) {
      Object.keys(emotions).forEach((key) => {
        emotions[key] = emotions[key] / userMessages.length;
      });
    }
  }

  try {
    const res = await fetch("http://localhost:5000/api/gemini", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({ 
        transcript: transcript || "No conversation", 
        emoData: emotions 
      }),
    });

    const data = await res.json();
    console.log("Gemini response:", data);

    // Save to sessionStorage for insights page
    sessionStorage.setItem(
      "svaraInsights",
      JSON.stringify({
        transcript: transcript || "No conversation recorded",
        emotions,
        analysis: data.response || "Analysis unavailable",
        timestamp: Date.now(),
      })
    );
  } catch (err) {
    console.error("Error calling Gemini:", err);
    
    // Save fallback data
    sessionStorage.setItem(
      "svaraInsights",
      JSON.stringify({
        transcript: transcript || "No conversation recorded",
        emotions,
        analysis: "Could not generate analysis. Please try again.",
        timestamp: Date.now(),
      })
    );
  }

  disconnect?.();
  navigate("/insights");
};

Displaying generated insights

Retrieve and display the Gemini-generated analysis:
Frontend/src/pages/insights.tsx
import { useEffect, useState } from "react";

interface InsightData {
  transcript: string;
  emotions: Record<string, number>;
  analysis: string;
  timestamp: number;
}

export default function InsightsPage() {
  const [insights, setInsights] = useState<InsightData | null>(null);

  useEffect(() => {
    const stored = sessionStorage.getItem("svaraInsights");
    if (stored) {
      setInsights(JSON.parse(stored));
    }
  }, []);

  if (!insights) {
    return <div>No insights available</div>;
  }

  return (
    <div>
      <h1>Your conversation insights</h1>
      
      <section>
        <h2>AI analysis</h2>
        <p>{insights.analysis}</p>
      </section>

      <section>
        <h2>Top emotions</h2>
        {Object.entries(insights.emotions)
          .sort(([, a], [, b]) => b - a)
          .slice(0, 3)
          .map(([emotion, score]) => (
            <div key={emotion}>
              {emotion}: {(score * 100).toFixed(1)}%
            </div>
          ))}
      </section>

      <section>
        <h2>Transcript</h2>
        <pre>{insights.transcript}</pre>
      </section>
    </div>
  );
}

Crafting effective prompts

Your GEMINI_PROMPT environment variable should use placeholders that get replaced with actual data:
Example prompt template
You are an empathetic AI analyzing a mental health conversation. Based on the following:

Conversation transcript:
{{transcript}}

Detected emotions:
{{emoData}}

Provide a brief, supportive insight (2-3 sentences) about the user's emotional state and any patterns you notice.
1

Use placeholders

Include {{transcript}} and {{emoData}} where you want the actual data inserted.
2

Set the tone

Define how Gemini should respond (empathetic, clinical, coaching, etc.).
3

Specify format

Request a specific length or structure for consistent outputs.
4

Add context

Mention the domain (mental health, coaching, etc.) for relevant insights.

Error handling

Implement comprehensive error handling for production use:
try {
  const response = await fetch(geminiEndpoint, options);
  const data = await response.json();

  if (!response.ok) {
    console.error('[Gemini API Error]', data);
    // Return user-friendly error
    return { error: 'Unable to process your request at this time' };
  }

  // Validate response structure
  if (!data.candidates?.[0]?.content?.parts?.[0]?.text) {
    console.error('Unexpected response structure:', data);
    return { error: 'Unable to process your request at this time' };
  }

  return { response: data.candidates[0].content.parts[0].text };
} catch (error) {
  console.error('Network or parsing error:', error);
  return { error: 'Unable to process your request at this time' };
}

Rate limiting and caching

Implementing cache

For repeated analysis requests, consider caching:
Backend/routes/gemini.ts
const CACHE_DURATION = 5 * 60 * 1000; // 5 minutes

interface CacheData {
  entries: Entry[];
  lastUpdated: number;
  lastModified: number;
}

let entriesCache: CacheData | null = null;

async function getEntriesWithCache(): Promise<Entry[]> {
  try {
    const stats = await fs.stat(ENTRIES_FILE_PATH);
    const fileModified = stats.mtimeMs;

    if (entriesCache && 
        Date.now() - entriesCache.lastUpdated < CACHE_DURATION &&
        entriesCache.lastModified === fileModified) {
      return entriesCache.entries;
    }

    const entriesData = await fs.readFile(ENTRIES_FILE_PATH, 'utf-8');
    const entries = JSON.parse(entriesData);

    entriesCache = {
      entries,
      lastUpdated: Date.now(),
      lastModified: fileModified
    };

    return entries;
  } catch (error) {
    console.error('Error reading entries file:', error);
    if (entriesCache) {
      console.warn('Using cached entries as fallback');
      return entriesCache.entries;
    }
    throw new Error('Unable to read entries data');
  }
}

API reference

Request body

interface GeminiRequest {
  transcript: string;  // Required: Full conversation text
  emoData?: Record<string, number>;  // Optional: Emotion scores
}

Response structure

interface GeminiResponse {
  response: string;  // Generated insight text
  emotions: Record<string, number>;  // Echoed emotion data
}

Error responses

interface GeminiError {
  error: string;  // User-friendly error message
}

Best practices

1

Validate input data

Always check that transcript exists before sending to Gemini. Empty transcripts waste API quota.
2

Keep prompts focused

Shorter, specific prompts yield better results than lengthy, vague ones.
3

Handle rate limits

Implement exponential backoff if you hit Gemini’s rate limits.
4

Log strategically

Log errors for debugging but never log sensitive user conversation data in production.
5

Set appropriate token limits

Match maxOutputTokens to your UI constraints to avoid truncation issues.

Troubleshooting

API key errors

If you see authentication errors:
  • Verify GEMINI_API_KEY is set correctly in your backend .env
  • Check that your API key is active in Google AI Studio
  • Ensure there are no extra spaces or quotes in the environment variable

Empty or unexpected responses

If Gemini returns no text:
  • Check your prompt template includes both placeholders
  • Verify data.candidates[0].content.parts[0].text path exists
  • Inspect the full response object for safety ratings or blocks

Response quality issues

If insights are generic or unhelpful:
  • Lower temperature for more focused responses
  • Add more context to your prompt template
  • Increase maxOutputTokens if responses seem cut off
  • Include example outputs in your prompt

Build docs developers (and LLMs) love