Image Analysis

Overview

The Image Analysis feature uses Google Gemini Vision to automatically identify, classify, and assess industrial parts and equipment from uploaded images. The system provides structured analysis including item type, condition, quantity, and maintenance recommendations.

Image analysis uses Gemini 2.5 Flash with structured output generation to ensure consistent, parseable results.

Capabilities

Part Identification

Automatically detect and classify equipment types (mobiliario, equipo)

Condition Assessment

Evaluate physical condition (nuevo, usado, dañado, requiere_inspeccion)

Quantity Detection

Count multiple items of the same type in a single image

Metadata Extraction

Identify brand, model, and visible identification codes

Analysis Schema

The vision API returns structured data validated with Zod:

const partAnalysisSchema = z.object({
  tipo_articulo: z.enum(['mobiliario', 'equipo'])
    .describe('Clasificación general del artículo'),
  codigo: z.string().optional()
    .describe('Código identificado visible en la pieza'),
  descripcion: z.string()
    .describe('Descripción detallada de la pieza o equipo'),
  marca: z.string().optional()
    .describe('Marca del fabricante si es visible'),
  modelo: z.string().optional()
    .describe('Modelo del equipo si es visible'),
  cantidad_detectada: z.number()
    .describe('Cantidad de piezas de este tipo detectadas en la foto'),
  estado_fisico: z.enum(['nuevo', 'usado', 'dañado', 'requiere_inspeccion'])
    .describe('Condición visual de la pieza'),
  recomendacion: z.string()
    .describe('Recomendación breve sobre el manejo o mantenimiento'),
  nivel_confianza: z.enum(['alta', 'media', 'baja'])
    .describe('Confianza de la IA sobre su identificación'),
});

type PartAnalysisResult = z.infer<typeof partAnalysisSchema>;

Server Action

The image analysis is performed via a server action:

export async function analyzePartImage(
  formData: FormData
): Promise<{ result: PartAnalysisResult | null; success: boolean; error?: string }> {
  const file = formData.get('file') as File | null;
  let customPrompt = formData.get('prompt') as string | null;
  
  if (!file) throw new Error('Imagen vacía');
  
  // Validate file size
  const sizeInBytes = file.size;
  const sizeInMB = bytesToMB(sizeInBytes);
  
  if (sizeInMB > MAX_IMAGE_SIZE_MB) {
    throw new Error(
      `Imagen demasiado grande (${sizeInMB.toFixed(1)}MB). Máximo: ${MAX_IMAGE_SIZE_MB}MB`
    );
  }
  
  const buffer = await file.arrayBuffer();
  const base64Content = Buffer.from(buffer).toString('base64');
  
  // Call Gemini Vision with structured output
  const result = await generateObject({
    model: google('gemini-2.5-flash'),
    temperature: 0.1, // Low temperature for schema adherence
    schema: partAnalysisSchema,
    messages: [
      {
        role: 'user',
        content: [
          { type: 'text', text: customPrompt || INVENTORY_PROMPT },
          { type: 'image', image: base64Content },
        ],
      },
    ],
  });
  
  return { result: result.object, success: true };
}

Integration with Chat

Images are automatically analyzed when uploaded to the chat interface:

Image Upload

User attaches an image file through the chat input or drag-and-drop

Size Validation

System checks if image is under the 5MB limit

const limitBytes = MAX_IMAGE_SIZE_BYTES;
if (fileSize > limitBytes) {
  throw new Error(`El archivo excede el límite de ${bytesToMB(limitBytes)}MB`);
}

Base64 Conversion

Image is converted from Blob URL to base64 for API transmission

const response = await fetch(imageFile.url);
const blob = await response.blob();

const base64Promise = new Promise<string>((resolve, reject) => {
  const reader = new FileReader();
  reader.onload = () => resolve(reader.result as string);
  reader.onerror = reject;
  reader.readAsDataURL(blob);
});

User Message Creation

User message with image attachment is added to chat

setMessages((prev) => [
  ...prev,
  {
    id: `user-${Date.now()}`,
    role: 'user',
    content: userText,
    parts: [
      { type: 'text', text: userText },
      { type: 'image', imageUrl: fileDataUrl, mimeType: file.mediaType },
    ],
    createdAt: new Date(),
  },
]);

Vision Analysis

Server action processes the image with Gemini Vision

Result Display

Structured analysis is formatted and added as assistant message

Result Formatting

Analysis results are presented in a user-friendly markdown format:

const formattedText = `📱 **Análisis Visual (IA)**

| Atributo | Detalle |
| :--- | :--- |
| **Tipo** | ${analysisObj.tipo_articulo} |
| **Estado** | ${analysisObj.estado_fisico.replace('_', ' ')} |
| **Confianza** | ${analysisObj.nivel_confianza} |
| **Marca** | ${analysisObj.marca || 'N/A'} |
| **Modelo** | ${analysisObj.modelo || 'N/A'} |
| **Cantidad** | ${analysisObj.cantidad_detectada} |

**Descripción detallada:**
> ${analysisObj.descripcion}

💡 **Recomendación:**  
*${analysisObj.recomendacion}*

---
*Generado automáticamente por IA a partir de la imagen.*`;

Example Output

Tipo	equipo
Estado	usado
Confianza	alta
Marca	Bosch
Modelo	GBH 2-28
Cantidad	1

Descripción detallada:

💡 Recomendación:

Verificar desgaste de brocas y lubricar mecanismo de percusión

Using the Image Analysis Hook

The useImageAnalysis hook provides a clean interface for image processing:

import { useImageAnalysis } from '@/app/components/features/chat/hooks/use-image-analysis';

function ImageUploadComponent() {
  const [messages, setMessages] = useState([]);
  const toast = useToast();
  
  const { analyzeImage } = useImageAnalysis({ setMessages, toast });
  
  const handleImageUpload = async (file: File) => {
    const imageFile = {
      url: URL.createObjectURL(file),
      mediaType: file.type,
      name: file.name,
    };
    
    const success = await analyzeImage(imageFile);
    
    if (success) {
      console.log('Analysis added to chat');
    }
  };
  
  return (
    <input 
      type="file" 
      accept="image/*" 
      onChange={(e) => handleImageUpload(e.target.files[0])}
    />
  );
}

File Size Limits

// From app/config/limits.ts
export const MAX_IMAGE_SIZE_MB = 5;
export const MAX_IMAGE_SIZE_BYTES = 5 * 1024 * 1024;

// Helper function
export function bytesToMB(bytes: number): number {
  return Math.round((bytes / (1024 * 1024)) * 10) / 10;
}

Images larger than 5MB will be rejected. Consider implementing client-side image compression for large files.

Custom Prompts

You can provide custom analysis prompts for specific use cases:

const formData = new FormData();
formData.append('file', imageBlob, 'equipment.jpg');
formData.append('prompt', 'Focus on safety hazards and compliance issues');

const result = await analyzePartImage(formData);

Error Handling

The system provides detailed error feedback:

if (result.success && result.result) {
  // Process successful analysis
  displayAnalysis(result.result);
} else {
  // Handle error
  const errorMsg = `❌ Error al analizar imagen: ${result.error || 'Error desconocido'}`;
  
  setMessages((prev) => [
    ...prev,
    {
      id: `vision-error-${Date.now()}`,
      role: 'assistant',
      content: errorMsg,
      parts: [],
      createdAt: new Date(),
    },
  ]);
  
  toast.error('Error de visión', result.error);
}

Supported Image Formats

The system accepts all standard image formats supported by browsers:

Supported Formats

JPEG (.jpg, .jpeg)
PNG (.png)
WebP (.webp)
GIF (.gif)
BMP (.bmp)
SVG (.svg)

Performance Optimization

Low Temperature

Uses temperature: 0.1 for consistent, schema-compliant responses

Efficient Model

Gemini 2.5 Flash provides fast responses at low cost

Size Validation

Client-side validation prevents oversized uploads

Structured Output

Using generateObject ensures valid, parseable results

Best Practices

Image Quality

Upload clear, well-lit images for best analysis results

Single Item Focus

For detailed analysis, photograph items individually when possible

Visible Details

Ensure brand names, model numbers, and condition indicators are visible

Error Handling

Always handle analysis failures gracefully with user feedback

Configuration

# .env.local
GOOGLE_GENERATIVE_AI_API_KEY=your_api_key_here

# Optional: Custom inventory prompt
INVENTORY_PROMPT="Analyze this industrial equipment..."

Multimodal Chat

Full chat interface with all modalities

Voice Commands

Voice transcription and command parsing

PDF Processing

Document analysis capabilities

Get Started

Core Features

AI Tools

Guides

Architecture

Overview

Capabilities

Part Identification

Condition Assessment

Quantity Detection

Metadata Extraction

Analysis Schema

Server Action

Integration with Chat

Result Formatting

Example Output

Using the Image Analysis Hook

File Size Limits

Custom Prompts

Error Handling

Supported Image Formats

Performance Optimization

Low Temperature

Efficient Model

Size Validation

Structured Output

Best Practices

Configuration

Multimodal Chat

Voice Commands

PDF Processing

Build docs developers (and LLMs) love

Get Started

Core Features

AI Tools

Guides

Architecture

​Overview

​Capabilities

Part Identification

Condition Assessment

Quantity Detection

Metadata Extraction

​Analysis Schema

​Server Action

​Integration with Chat

​Result Formatting

​Example Output

​Using the Image Analysis Hook

​File Size Limits

​Custom Prompts

​Error Handling

​Supported Image Formats

​Performance Optimization

Low Temperature

Efficient Model

Size Validation

Structured Output

​Best Practices

​Configuration

​Related Features

Multimodal Chat

Voice Commands

PDF Processing

Build docs developers (and LLMs) love

Overview

Capabilities

Analysis Schema

Server Action

Integration with Chat

Result Formatting

Example Output

Using the Image Analysis Hook

File Size Limits

Custom Prompts

Error Handling

Supported Image Formats

Performance Optimization

Best Practices

Configuration

Related Features