Multimodal Chat

Overview

The Gima AI Chatbot provides a comprehensive multimodal chat experience that combines text messaging with voice transcription, image analysis, and PDF processing. The chat component orchestrates multiple AI capabilities into a unified conversational interface.

The chat uses GROQ Llama 3.1 8B for text conversations, Gemini Vision for image analysis, and Gemini Flash Lite for voice transcription.

Key Features

Text Chat

Real-time AI conversations with persistent message history

Voice Input

Dual-mode voice recognition with Gemini AI and Web Speech API fallback

Image Analysis

Automatic analysis of uploaded images using Gemini Vision

PDF Processing

Extract and analyze content from PDF documents

Architecture

The chat system is built with a modular architecture using React hooks for state management:

// Main chat component structure
export function Chat() {
  // State management
  const [input, setInput] = useState('');
  
  // Core integrations
  const {
    messages,
    sendMessage,
    status,
    reload,
    clearHistory,
    setMessages,
    addToolOutput,
  } = usePersistentChat();
  
  // Voice input with dual-mode support
  const {
    isListening,
    isProcessing,
    isSupported,
    mode,
    transcript,
    toggleListening,
    error: voiceError,
  } = useVoiceInput({ onTranscript: updateTextareaValue });
  
  // File submission (Images & PDFs)
  const { handleSubmit, isAnalyzing, analyzingFileType } = useFileSubmission({
    setMessages,
    sendMessage,
    isListening,
    toggleListening,
  });
  
  // ...
}

Message Flow

User Input

User enters text, attaches files, or uses voice input through the chat interface

Input Processing

The system detects the input type and routes it to the appropriate handler:

Text messages → GROQ API
Images → Gemini Vision analysis
PDFs → Gemini Flash document processing
Voice → Transcription then text processing

AI Processing

The appropriate AI model processes the input and generates a response

Response Display

Results are added to the message thread with proper formatting and displayed to the user

Persistence

Messages are automatically saved to localStorage for conversation continuity

Chat State Management

The chat component uses a computed canSend state to control when messages can be submitted:

const canSend =
  (status === 'ready' ||
    status === 'error' ||
    (status !== 'streaming' && status !== 'submitted')) &&
  !isAnalyzing;

This prevents message submission when:

The AI is currently streaming a response
A file is being analyzed
The system is in an invalid state

Keyboard Shortcuts

The chat includes productivity-enhancing keyboard shortcuts via the useChatKeyboard hook:

Available Shortcuts

Shortcut	Action
`Cmd/Ctrl + Enter`	Submit message
`Escape`	Cancel voice recording
`Cmd/Ctrl + L`	Focus input field

Quick Actions

Pre-defined prompts provide fast access to common operations:

// Quick action handler
const handleQuickAction = useCallback(
  (prompt: string) => {
    const action = QUICK_ACTIONS.find((a) => a.prompt === prompt);
    
    if (action?.formFields && action.formFields.length > 0) {
      // Action requires data → show inline form
      setActiveQuickAction(action);
      return;
    }
    
    // Send prompt directly
    if (!prompt.endsWith(' ')) {
      handleSubmit({ text: prompt, files: [] });
    } else {
      updateTextareaValue(prompt);
      textareaRef.current?.focus();
    }
  },
  [handleSubmit, updateTextareaValue]
);

Message Storage

Messages are persisted using the usePersistentChat hook with localStorage:

The system maintains a maximum of 100 messages in localStorage to prevent storage bloat. Older messages are automatically pruned.

Error Handling

The chat provides comprehensive error handling with user-friendly messages:

// Voice error display
<ChatStatusIndicators
  voiceError={voiceError ?? undefined}
  isListening={isListening}
  isProcessing={isProcessing}
  isAnalyzingImage={isAnalyzing}
  fileType={analyzingFileType || 'image'}
  chatError={chatError}
  mode={mode}
/>

Errors are categorized by type:

Voice errors: Microphone permissions, browser support, API issues
Chat errors: Network failures, API rate limits
File errors: Size limits, unsupported formats

Configuration

The chat component can be configured through environment variables:

# Required API Keys
GOOGLE_GENERATIVE_AI_API_KEY=your_gemini_key
GROQ_API_KEY=your_groq_key

# Optional: Model Selection
DEFAULT_MODEL=llama-3.1-8b-instant

Theme Support

The chat automatically adapts to the application’s theme (light/dark mode) using Tailwind CSS classes:

<div className="bg-blue-50 dark:bg-blue-950/30 rounded-lg border border-blue-200 dark:border-blue-800">
  {/* Content */}
</div>

Best Practices

Message Validation

Always validate user input before submission to prevent empty or invalid messages

Error Recovery

Provide clear error messages and allow users to retry failed operations

State Management

Use computed states to prevent race conditions between different input modes

Accessibility

Ensure keyboard shortcuts and screen reader support for all interactions

Voice Commands

Learn about voice input capabilities

Image Analysis

Explore image processing features

PDF Processing

Discover PDF analysis capabilities

Get Started

Core Features

AI Tools

Guides

Architecture

Overview

Key Features

Text Chat

Voice Input

Image Analysis

PDF Processing

Architecture

Message Flow

Chat State Management

Keyboard Shortcuts

Quick Actions

Message Storage

Error Handling

Configuration

Theme Support

Best Practices

Message Validation

Error Recovery

State Management

Accessibility

Voice Commands

Image Analysis

PDF Processing

Build docs developers (and LLMs) love

Get Started

Core Features

AI Tools

Guides

Architecture

​Overview

​Key Features

Text Chat

Voice Input

Image Analysis

PDF Processing

​Architecture

​Message Flow

​Chat State Management

​Keyboard Shortcuts

​Quick Actions

​Message Storage

​Error Handling

​Configuration

​Theme Support

​Best Practices

Message Validation

Error Recovery

State Management

Accessibility

​Related Features

Voice Commands

Image Analysis

PDF Processing

Build docs developers (and LLMs) love

Overview

Key Features

Architecture

Message Flow

Chat State Management

Keyboard Shortcuts

Quick Actions

Message Storage

Error Handling

Configuration

Theme Support

Best Practices

Related Features