Gemini AI integration

SvaraAI uses Google’s Gemini AI to analyze conversation transcripts and emotional data, generating personalized insights and reflections based on the user’s voice interactions.

Overview

The Gemini AI integration processes:

Conversation transcripts: Full text of user and assistant messages
Emotional data: Aggregated emotion scores from Hume AI
Context-aware prompts: Customizable system prompts for tailored responses

Gemini generates concise, actionable insights that help users understand their emotional state and conversation patterns.

Prerequisites

Before you begin, ensure you have:

A Google Cloud account with Gemini API access
Gemini API key from Google AI Studio or Cloud Console
Configured custom prompt template

Installation

Obtain API key

Get your Gemini API key from Google AI Studio.

Configure environment variables

Add your Gemini credentials to the backend environment:

Backend/.env

GEMINI_API_KEY=your_api_key_here
GEMINI_PROMPT="Your custom prompt template with {{transcript}} and {{emoData}} placeholders"

The GEMINI_API_KEY should never be exposed in client-side code. All Gemini requests must go through your backend.

Backend implementation

API endpoint setup

Create an Express route to handle Gemini requests:

Backend/routes/gemini.ts

import express, { Request, Response } from 'express';
import dotenv from 'dotenv';
import path from 'path';

dotenv.config({ path: path.resolve(__dirname, '../../.env') });

const router = express.Router();

router.post('/', async (req: Request, res: Response): Promise<void> => {
  try {
    const { transcript, emoData } = req.body;

    if (!transcript) {
      console.error('No transcript provided');
      res.status(400).json({ error: 'Transcript is required' });
      return;
    }
    
    const emotionData = emoData || {};

    const apiKey = process.env.GEMINI_API_KEY;
    if (!apiKey) {
      console.error('GEMINI_API_KEY is missing from environment variables');
      res.status(500).json({ error: 'Unable to process your request at this time' });
      return;
    }

    // Process and send to Gemini
    // ... (implementation below)
  } catch (error: any) {
    console.error('[Gemini API Error] An error occurred:', error);
    res.status(500).json({
      error: 'Unable to process your request at this time'
    });
  }
});

export default router;

Formatting emotion data

Process raw emotion scores into a human-readable format:

Backend/routes/gemini.ts

const formattedEmoData = Object.entries(emotionData)
  .sort(([, a], [, b]) => (b as number) - (a as number))
  .slice(0, 3)  // Top 3 emotions
  .map(([emotion, score]) => `${emotion}: ${((score as number) * 100).toFixed(1)}%`)
  .join('\n');

This sorts emotions by intensity and selects the top 3 most prominent emotions, formatting them as percentages for better readability.

Dynamic prompt construction

Replace placeholders in your prompt template with actual data:

Backend/routes/gemini.ts

const rawPrompt = process.env.GEMINI_PROMPT;
if (!rawPrompt) {
  console.error('GEMINI_PROMPT is missing from environment variables');
  res.status(500).json({ error: 'Unable to process your request at this time' });
  return;
}

const prompt = rawPrompt
  .replace('{{transcript}}', transcript)
  .replace('{{emoData}}', formattedEmoData);

Making the Gemini API request

Send the formatted prompt to Gemini 2.0 Flash:

Backend/routes/gemini.ts

const response = await fetch(
  `https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent?key=${apiKey}`,
  {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      contents: [{
        parts: [{
          text: prompt
        }]
      }],
      generationConfig: {
        temperature: 0.3,
        topK: 20,
        topP: 0.8,
        maxOutputTokens: 100,
      }
    })
  }
);

const data = await response.json();

if (!response.ok) {
  console.error('[Gemini API Error]', data);
  res.status(response.status).json({
    error: 'Unable to process your request at this time'
  });
  return;
}

if (data.candidates?.[0]?.content?.parts?.[0]?.text) {
  res.json({
    response: data.candidates[0].content.parts[0].text,
    emotions: emotionData
  });
} else {
  console.error('Unexpected Gemini response structure:', data);
  res.status(500).json({ 
    error: 'Unable to process your request at this time'
  });
}

Generation configuration

The generation parameters are optimized for concise, focused insights:

Parameter	Value	Purpose
`temperature`	0.3	Low randomness for consistent, focused responses
`topK`	20	Limits vocabulary to top 20 probable tokens
`topP`	0.8	Nucleus sampling for balanced creativity
`maxOutputTokens`	100	Keeps responses brief and actionable

These parameters are tuned for generating short insights. Adjust maxOutputTokens if you need longer responses.

Frontend implementation

Sending conversation data

Call the Gemini endpoint when the user ends their call:

Frontend/src/components/controls.tsx

const handleEndCall = async () => {
  const validMessages = messages.filter(
    (msg) => msg.type === "user_message" || msg.type === "assistant_message"
  );

  let transcript = "";
  let emotions: Record<string, number> = {};

  if (validMessages.length > 0) {
    // Build transcript
    transcript = validMessages
      .map((msg) => {
        const role = msg.type === "user_message" ? "User" : "Assistant";
        const content = "message" in msg ? msg.message?.content || "" : "";
        return `${role}: ${content}`;
      })
      .filter((line) => line.includes(": ") && line.split(": ")[1].trim())
      .join("\n");

    // Aggregate emotions from user messages
    const userMessages = validMessages.filter((msg) => msg.type === "user_message");
    userMessages.forEach((msg) => {
      if ("models" in msg && msg.models?.prosody?.scores) {
        const scores = msg.models.prosody.scores;
        Object.entries(scores).forEach(([emotion, score]) => {
          emotions[emotion] = (emotions[emotion] || 0) + (score as number);
        });
      }
    });

    // Calculate averages
    if (userMessages.length > 0) {
      Object.keys(emotions).forEach((key) => {
        emotions[key] = emotions[key] / userMessages.length;
      });
    }
  }

  try {
    const res = await fetch("http://localhost:5000/api/gemini", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({ 
        transcript: transcript || "No conversation", 
        emoData: emotions 
      }),
    });

    const data = await res.json();
    console.log("Gemini response:", data);

    // Save to sessionStorage for insights page
    sessionStorage.setItem(
      "svaraInsights",
      JSON.stringify({
        transcript: transcript || "No conversation recorded",
        emotions,
        analysis: data.response || "Analysis unavailable",
        timestamp: Date.now(),
      })
    );
  } catch (err) {
    console.error("Error calling Gemini:", err);
    
    // Save fallback data
    sessionStorage.setItem(
      "svaraInsights",
      JSON.stringify({
        transcript: transcript || "No conversation recorded",
        emotions,
        analysis: "Could not generate analysis. Please try again.",
        timestamp: Date.now(),
      })
    );
  }

  disconnect?.();
  navigate("/insights");
};

Displaying generated insights

Retrieve and display the Gemini-generated analysis:

Frontend/src/pages/insights.tsx

import { useEffect, useState } from "react";

interface InsightData {
  transcript: string;
  emotions: Record<string, number>;
  analysis: string;
  timestamp: number;
}

export default function InsightsPage() {
  const [insights, setInsights] = useState<InsightData | null>(null);

  useEffect(() => {
    const stored = sessionStorage.getItem("svaraInsights");
    if (stored) {
      setInsights(JSON.parse(stored));
    }
  }, []);

  if (!insights) {
    return <div>No insights available</div>;
  }

  return (
    <div>
      <h1>Your conversation insights</h1>
      
      <section>
        <h2>AI analysis</h2>
        <p>{insights.analysis}</p>
      </section>

      <section>
        <h2>Top emotions</h2>
        {Object.entries(insights.emotions)
          .sort(([, a], [, b]) => b - a)
          .slice(0, 3)
          .map(([emotion, score]) => (
            <div key={emotion}>
              {emotion}: {(score * 100).toFixed(1)}%
            </div>
          ))}
      </section>

      <section>
        <h2>Transcript</h2>
        <pre>{insights.transcript}</pre>
      </section>
    </div>
  );
}

Crafting effective prompts

Your GEMINI_PROMPT environment variable should use placeholders that get replaced with actual data:

Example prompt template

You are an empathetic AI analyzing a mental health conversation. Based on the following:

Conversation transcript:
{{transcript}}

Detected emotions:
{{emoData}}

Provide a brief, supportive insight (2-3 sentences) about the user's emotional state and any patterns you notice.

Use placeholders

Include {{transcript}} and {{emoData}} where you want the actual data inserted.

Set the tone

Define how Gemini should respond (empathetic, clinical, coaching, etc.).

Specify format

Request a specific length or structure for consistent outputs.

Add context

Mention the domain (mental health, coaching, etc.) for relevant insights.

Error handling

Implement comprehensive error handling for production use:

try {
  const response = await fetch(geminiEndpoint, options);
  const data = await response.json();

  if (!response.ok) {
    console.error('[Gemini API Error]', data);
    // Return user-friendly error
    return { error: 'Unable to process your request at this time' };
  }

  // Validate response structure
  if (!data.candidates?.[0]?.content?.parts?.[0]?.text) {
    console.error('Unexpected response structure:', data);
    return { error: 'Unable to process your request at this time' };
  }

  return { response: data.candidates[0].content.parts[0].text };
} catch (error) {
  console.error('Network or parsing error:', error);
  return { error: 'Unable to process your request at this time' };
}

Rate limiting and caching

Implementing cache

For repeated analysis requests, consider caching:

Backend/routes/gemini.ts

const CACHE_DURATION = 5 * 60 * 1000; // 5 minutes

interface CacheData {
  entries: Entry[];
  lastUpdated: number;
  lastModified: number;
}

let entriesCache: CacheData | null = null;

async function getEntriesWithCache(): Promise<Entry[]> {
  try {
    const stats = await fs.stat(ENTRIES_FILE_PATH);
    const fileModified = stats.mtimeMs;

    if (entriesCache && 
        Date.now() - entriesCache.lastUpdated < CACHE_DURATION &&
        entriesCache.lastModified === fileModified) {
      return entriesCache.entries;
    }

    const entriesData = await fs.readFile(ENTRIES_FILE_PATH, 'utf-8');
    const entries = JSON.parse(entriesData);

    entriesCache = {
      entries,
      lastUpdated: Date.now(),
      lastModified: fileModified
    };

    return entries;
  } catch (error) {
    console.error('Error reading entries file:', error);
    if (entriesCache) {
      console.warn('Using cached entries as fallback');
      return entriesCache.entries;
    }
    throw new Error('Unable to read entries data');
  }
}

API reference

Request body

interface GeminiRequest {
  transcript: string;  // Required: Full conversation text
  emoData?: Record<string, number>;  // Optional: Emotion scores
}

Response structure

interface GeminiResponse {
  response: string;  // Generated insight text
  emotions: Record<string, number>;  // Echoed emotion data
}

Error responses

interface GeminiError {
  error: string;  // User-friendly error message
}

Best practices

Validate input data

Always check that transcript exists before sending to Gemini. Empty transcripts waste API quota.

Keep prompts focused

Shorter, specific prompts yield better results than lengthy, vague ones.

Handle rate limits

Implement exponential backoff if you hit Gemini’s rate limits.

Log strategically

Log errors for debugging but never log sensitive user conversation data in production.

Set appropriate token limits

Match maxOutputTokens to your UI constraints to avoid truncation issues.

Troubleshooting

API key errors

If you see authentication errors:

Verify GEMINI_API_KEY is set correctly in your backend .env
Check that your API key is active in Google AI Studio
Ensure there are no extra spaces or quotes in the environment variable

Empty or unexpected responses

If Gemini returns no text:

Check your prompt template includes both placeholders
Verify data.candidates[0].content.parts[0].text path exists
Inspect the full response object for safety ratings or blocks

Response quality issues

If insights are generic or unhelpful:

Lower temperature for more focused responses
Add more context to your prompt template
Increase maxOutputTokens if responses seem cut off
Include example outputs in your prompt

Get Started

Core Features

Architecture

Integrations

Gemini AI integration

Overview

Prerequisites

Installation

Backend implementation

API endpoint setup

Formatting emotion data

Dynamic prompt construction

Making the Gemini API request

Generation configuration

Frontend implementation

Sending conversation data

Displaying generated insights

Crafting effective prompts

Error handling

Rate limiting and caching

Implementing cache

API reference

Request body

Response structure

Error responses

Best practices

Troubleshooting

API key errors

Empty or unexpected responses

Response quality issues

Build docs developers (and LLMs) love

Get Started

Core Features

Architecture

Integrations

​Overview

​Prerequisites

​Installation

​Backend implementation

​API endpoint setup

​Formatting emotion data

​Dynamic prompt construction

​Making the Gemini API request

​Generation configuration

​Frontend implementation

​Sending conversation data

​Displaying generated insights

​Crafting effective prompts

​Error handling

​Rate limiting and caching

​Implementing cache

​API reference

​Request body

​Response structure

​Error responses

​Best practices

​Troubleshooting

​API key errors

​Empty or unexpected responses

​Response quality issues

​Related resources

Build docs developers (and LLMs) love

Overview

Prerequisites

Installation

Backend implementation

API endpoint setup

Formatting emotion data

Dynamic prompt construction

Making the Gemini API request

Generation configuration

Frontend implementation

Sending conversation data

Displaying generated insights

Crafting effective prompts

Error handling

Rate limiting and caching

Implementing cache

API reference

Request body

Response structure

Error responses

Best practices

Troubleshooting

API key errors

Empty or unexpected responses

Response quality issues

Related resources