Skip to main content

Overview

The Image Analysis feature uses Google Gemini Vision to automatically identify, classify, and assess industrial parts and equipment from uploaded images. The system provides structured analysis including item type, condition, quantity, and maintenance recommendations.
Image analysis uses Gemini 2.5 Flash with structured output generation to ensure consistent, parseable results.

Capabilities

Part Identification

Automatically detect and classify equipment types (mobiliario, equipo)

Condition Assessment

Evaluate physical condition (nuevo, usado, dañado, requiere_inspeccion)

Quantity Detection

Count multiple items of the same type in a single image

Metadata Extraction

Identify brand, model, and visible identification codes

Analysis Schema

The vision API returns structured data validated with Zod:
const partAnalysisSchema = z.object({
  tipo_articulo: z.enum(['mobiliario', 'equipo'])
    .describe('Clasificación general del artículo'),
  codigo: z.string().optional()
    .describe('Código identificado visible en la pieza'),
  descripcion: z.string()
    .describe('Descripción detallada de la pieza o equipo'),
  marca: z.string().optional()
    .describe('Marca del fabricante si es visible'),
  modelo: z.string().optional()
    .describe('Modelo del equipo si es visible'),
  cantidad_detectada: z.number()
    .describe('Cantidad de piezas de este tipo detectadas en la foto'),
  estado_fisico: z.enum(['nuevo', 'usado', 'dañado', 'requiere_inspeccion'])
    .describe('Condición visual de la pieza'),
  recomendacion: z.string()
    .describe('Recomendación breve sobre el manejo o mantenimiento'),
  nivel_confianza: z.enum(['alta', 'media', 'baja'])
    .describe('Confianza de la IA sobre su identificación'),
});

type PartAnalysisResult = z.infer<typeof partAnalysisSchema>;

Server Action

The image analysis is performed via a server action:
export async function analyzePartImage(
  formData: FormData
): Promise<{ result: PartAnalysisResult | null; success: boolean; error?: string }> {
  const file = formData.get('file') as File | null;
  let customPrompt = formData.get('prompt') as string | null;
  
  if (!file) throw new Error('Imagen vacía');
  
  // Validate file size
  const sizeInBytes = file.size;
  const sizeInMB = bytesToMB(sizeInBytes);
  
  if (sizeInMB > MAX_IMAGE_SIZE_MB) {
    throw new Error(
      `Imagen demasiado grande (${sizeInMB.toFixed(1)}MB). Máximo: ${MAX_IMAGE_SIZE_MB}MB`
    );
  }
  
  const buffer = await file.arrayBuffer();
  const base64Content = Buffer.from(buffer).toString('base64');
  
  // Call Gemini Vision with structured output
  const result = await generateObject({
    model: google('gemini-2.5-flash'),
    temperature: 0.1, // Low temperature for schema adherence
    schema: partAnalysisSchema,
    messages: [
      {
        role: 'user',
        content: [
          { type: 'text', text: customPrompt || INVENTORY_PROMPT },
          { type: 'image', image: base64Content },
        ],
      },
    ],
  });
  
  return { result: result.object, success: true };
}

Integration with Chat

Images are automatically analyzed when uploaded to the chat interface:
1

Image Upload

User attaches an image file through the chat input or drag-and-drop
2

Size Validation

System checks if image is under the 5MB limit
const limitBytes = MAX_IMAGE_SIZE_BYTES;
if (fileSize > limitBytes) {
  throw new Error(`El archivo excede el límite de ${bytesToMB(limitBytes)}MB`);
}
3

Base64 Conversion

Image is converted from Blob URL to base64 for API transmission
const response = await fetch(imageFile.url);
const blob = await response.blob();

const base64Promise = new Promise<string>((resolve, reject) => {
  const reader = new FileReader();
  reader.onload = () => resolve(reader.result as string);
  reader.onerror = reject;
  reader.readAsDataURL(blob);
});
4

User Message Creation

User message with image attachment is added to chat
setMessages((prev) => [
  ...prev,
  {
    id: `user-${Date.now()}`,
    role: 'user',
    content: userText,
    parts: [
      { type: 'text', text: userText },
      { type: 'image', imageUrl: fileDataUrl, mimeType: file.mediaType },
    ],
    createdAt: new Date(),
  },
]);
5

Vision Analysis

Server action processes the image with Gemini Vision
6

Result Display

Structured analysis is formatted and added as assistant message

Result Formatting

Analysis results are presented in a user-friendly markdown format:
const formattedText = `📱 **Análisis Visual (IA)**

| Atributo | Detalle |
| :--- | :--- |
| **Tipo** | ${analysisObj.tipo_articulo} |
| **Estado** | ${analysisObj.estado_fisico.replace('_', ' ')} |
| **Confianza** | ${analysisObj.nivel_confianza} |
| **Marca** | ${analysisObj.marca || 'N/A'} |
| **Modelo** | ${analysisObj.modelo || 'N/A'} |
| **Cantidad** | ${analysisObj.cantidad_detectada} |

**Descripción detallada:**
> ${analysisObj.descripcion}

💡 **Recomendación:**  
*${analysisObj.recomendacion}*

---
*Generado automáticamente por IA a partir de la imagen.*`;

Example Output

Tipoequipo
Estadousado
Confianzaalta
MarcaBosch
ModeloGBH 2-28
Cantidad1

Descripción detallada:

💡 Recomendación:

Verificar desgaste de brocas y lubricar mecanismo de percusión

Using the Image Analysis Hook

The useImageAnalysis hook provides a clean interface for image processing:
import { useImageAnalysis } from '@/app/components/features/chat/hooks/use-image-analysis';

function ImageUploadComponent() {
  const [messages, setMessages] = useState([]);
  const toast = useToast();
  
  const { analyzeImage } = useImageAnalysis({ setMessages, toast });
  
  const handleImageUpload = async (file: File) => {
    const imageFile = {
      url: URL.createObjectURL(file),
      mediaType: file.type,
      name: file.name,
    };
    
    const success = await analyzeImage(imageFile);
    
    if (success) {
      console.log('Analysis added to chat');
    }
  };
  
  return (
    <input 
      type="file" 
      accept="image/*" 
      onChange={(e) => handleImageUpload(e.target.files[0])}
    />
  );
}

File Size Limits

// From app/config/limits.ts
export const MAX_IMAGE_SIZE_MB = 5;
export const MAX_IMAGE_SIZE_BYTES = 5 * 1024 * 1024;

// Helper function
export function bytesToMB(bytes: number): number {
  return Math.round((bytes / (1024 * 1024)) * 10) / 10;
}
Images larger than 5MB will be rejected. Consider implementing client-side image compression for large files.

Custom Prompts

You can provide custom analysis prompts for specific use cases:
const formData = new FormData();
formData.append('file', imageBlob, 'equipment.jpg');
formData.append('prompt', 'Focus on safety hazards and compliance issues');

const result = await analyzePartImage(formData);

Error Handling

The system provides detailed error feedback:
if (result.success && result.result) {
  // Process successful analysis
  displayAnalysis(result.result);
} else {
  // Handle error
  const errorMsg = `❌ Error al analizar imagen: ${result.error || 'Error desconocido'}`;
  
  setMessages((prev) => [
    ...prev,
    {
      id: `vision-error-${Date.now()}`,
      role: 'assistant',
      content: errorMsg,
      parts: [],
      createdAt: new Date(),
    },
  ]);
  
  toast.error('Error de visión', result.error);
}

Supported Image Formats

The system accepts all standard image formats supported by browsers:
  • JPEG (.jpg, .jpeg)
  • PNG (.png)
  • WebP (.webp)
  • GIF (.gif)
  • BMP (.bmp)
  • SVG (.svg)

Performance Optimization

Low Temperature

Uses temperature: 0.1 for consistent, schema-compliant responses

Efficient Model

Gemini 2.5 Flash provides fast responses at low cost

Size Validation

Client-side validation prevents oversized uploads

Structured Output

Using generateObject ensures valid, parseable results

Best Practices

1

Image Quality

Upload clear, well-lit images for best analysis results
2

Single Item Focus

For detailed analysis, photograph items individually when possible
3

Visible Details

Ensure brand names, model numbers, and condition indicators are visible
4

Error Handling

Always handle analysis failures gracefully with user feedback

Configuration

# .env.local
GOOGLE_GENERATIVE_AI_API_KEY=your_api_key_here

# Optional: Custom inventory prompt
INVENTORY_PROMPT="Analyze this industrial equipment..."

Multimodal Chat

Full chat interface with all modalities

Voice Commands

Voice transcription and command parsing

PDF Processing

Document analysis capabilities

Build docs developers (and LLMs) love