Overview
Gima AI Chatbot uses a multi-model architecture, leveraging different AI providers for specific tasks:- GROQ (Llama): Fast text generation, chat, structured data tasks
- Google Gemini: Multimodal tasks (voice, images, PDFs)
app/constants/ai.ts and app/config/models.ts.
Available Models
Primary Chat Model
The default model for conversational AI is configured inapp/config/models.ts:13-18:
- Model: Llama 3.1 8B Instant
- Provider: GROQ
- Purpose: General conversation, equipment queries, maintenance assistance
- Speed: Optimized for low latency
Model Configuration by Task
Different tasks use different models optimized for their specific requirements.Text Generation Tasks (GROQ)
Chat Conversation
- File:
app/api/chat/route.ts - Model: Llama 3.3 70B Versatile
- Use case: Main chatbot responses, equipment queries, maintenance help
Checklist Generation
- Temperature:
0.3(focused, consistent output) - Max tokens: 2000
- Use case: Generating maintenance checklists for equipment
- Source:
app/constants/ai.ts:58-63
Activity Summary
- Temperature:
0.4(balanced creativity) - Max tokens: 500
- Use case: Automatic summaries of maintenance activities
- Source:
app/constants/ai.ts:71-76
Work Order Closeout
- Temperature:
0.3(precise documentation) - Max tokens: 800
- Use case: Generating closeout notes for completed work orders
- Source:
app/constants/ai.ts:97-102
Data Transformation
- Temperature:
0.1(very deterministic) - Max tokens: 1000
- Use case: Structured data transformations and parsing
- Source:
app/constants/ai.ts:84-89
Multimodal Tasks (Gemini)
Voice Transcription
- Model: Gemini 2.5 Flash Lite (lightweight)
- Temperature:
0(exact transcription) - File:
app/actions/voice.ts:50 - Use case: Converting audio to text
- Source:
app/constants/ai.ts:114-119
Voice Command Parsing
- Model: Gemini 2.5 Flash Lite
- Temperature:
0(deterministic parsing) - File:
app/actions/voice.ts:148 - Use case: Parsing voice commands to structured JSON
- Source:
app/constants/ai.ts:127-132
Image Analysis
- Model: Gemini 2.5 Flash (full version for vision)
- Temperature:
0.2(precise analysis) - File:
app/actions/vision.ts:50 - Use case: Analyzing photos of equipment and parts
- Source:
app/constants/ai.ts:140-145
PDF Extraction
- Model: Gemini 2.5 Flash
- Temperature:
0.1(accurate extraction) - File:
app/actions/files.ts:47 - Use case: Extracting content from technical manuals
- Max file size: 10MB
- Source:
app/constants/ai.ts:153-158
Temperature Guide
Thetemperature parameter controls output randomness:
| Temperature | Behavior | Use Cases |
|---|---|---|
0 - 0.1 | Very deterministic | Transcription, data parsing, extraction |
0.2 - 0.4 | Focused, consistent | Documentation, summaries, technical writing |
0.5 - 0.7 | Balanced | General chat, creative but accurate |
0.8 - 1.0 | Creative, varied | Not used in Gima (precision preferred) |
Performance & Timeouts
Different operations have different timeout configurations:app/constants/ai.ts:193-197
Retry Configuration
For handling transient errors:- Max retries: 3 attempts
- Backoff strategy: Exponential (1s, 2s, 4s)
- Max backoff: 30 seconds
app/constants/ai.ts:202-206
Streaming Configuration
The chat API uses streaming responses:- Max duration: 30 seconds
- Sources: Not sent (to reduce token usage)
- Reasoning: Not sent (for faster responses)
app/config/server.ts:156-160
System Prompts
Gima uses specialized system prompts for different capabilities:Main Chat Prompt
The primary system prompt includes:- University context (UNEG)
- Technical terminology glossary
- Tool usage instructions
- Safety guidelines
app/config/server.ts:40-86
Voice Transcription Prompt
- Strict literal transcription rules
- Technical acronym recognition
- No filtering or interpretation
app/config/server.ts:94-108
Inventory Analysis Prompt
- Structured JSON extraction
- Technical specification detection
- Condition assessment
app/config/server.ts:116-149
Technical Acronyms
The system recognizes these technical terms:app/config/server.ts:9-20
Model Selection Best Practices
Use Lightweight Models
Use
gemini-2.5-flash-lite for simple tasks like transcription and command parsing.Lower Temperature for Precision
Set temperature to 0-0.3 for documentation, data extraction, and technical writing.
Appropriate Token Limits
Set
maxTokens based on expected output length to save costs and improve speed.Provider by Task
Use GROQ for text, Gemini for multimodal - each excels at different tasks.
Customizing Models
To change the model for a specific task:- Locate the configuration in
app/constants/ai.ts - Update the
model,temperature, ormaxTokens - Test thoroughly before deploying
Next Steps
Configuration
Set up API keys for GROQ and Gemini
Backend Integration
Connect to the Laravel backend for real data