Skip to main content

Overview

OmniSearches supports multimodal search, allowing you to upload images alongside your text queries. This feature enables visual search scenarios where images provide crucial context that text alone cannot capture.
Multimodal search is available via the POST /api/search endpoint and through the web interface using the paperclip icon.

How It Works

The multimodal search feature combines image and text inputs to provide more contextual results:
  1. Image Upload: Upload up to 4 images (JPEG, PNG, GIF, or WebP)
  2. Context Integration: Images are encoded and sent to Gemini 2.0 Flash
  3. Combined Analysis: The AI analyzes both visual and textual information
  4. Enhanced Results: Receive answers that consider both image context and your query
Maximum file limit: 4 images per search. Each image is base64-encoded and sent with your query.

Supported Image Formats

JPEG

.jpg, .jpeg - Standard photo format

PNG

.png - Lossless format with transparency

GIF

.gif - Animated or static graphics

WebP

.webp - Modern web format

Usage Examples

Via Web Interface

  1. Navigate to the OmniSearches homepage
  2. Click the paperclip icon (📎) in the search bar
  3. Select up to 4 images from your device
  4. Preview your uploaded images
  5. Enter your text query
  6. Click Search to get results
You can remove individual images from the preview by clicking the X button on each image thumbnail.

Via API

Send images as base64-encoded strings in the user_images array:
POST /api/search
const response = await fetch('/api/search', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    query: 'What type of flower is this?',
    mode: 'default',
    user_images: [
      {
        data: 'base64_encoded_image_data_here',
        mimeType: 'image/jpeg'
      }
    ]
  })
});

Using Fetch with File Input

Upload and Search
async function searchWithImages(query: string, files: File[]) {
  // Convert files to base64
  const imagePromises = files.map(file => {
    return new Promise((resolve, reject) => {
      const reader = new FileReader();
      reader.onload = (e) => {
        const base64 = e.target?.result as string;
        // Remove data URL prefix
        const data = base64.split(',')[1];
        resolve({
          data,
          mimeType: file.type
        });
      };
      reader.onerror = reject;
      reader.readAsDataURL(file);
    });
  });

  const user_images = await Promise.all(imagePromises);

  const response = await fetch('/api/search', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      query,
      mode: 'default',
      user_images
    })
  });

  return response.json();
}

Use Cases

Upload photos of plants, animals, objects, or landmarks to identify them and learn more.Example: “What species of bird is this?” + image of a bird
Search for products similar to ones you’ve photographed or saved.Example: “Where can I buy this style of furniture?” + image of furniture
Upload screenshots, diagrams, or documents for analysis and explanation.Example: “Explain this architecture diagram” + diagram image
Share error messages, UI issues, or hardware problems visually.Example: “How do I fix this error?” + screenshot of error
Analyze artwork, design patterns, or creative works for inspiration.Example: “What art movement is this?” + image of painting

Image Processing

When you upload images, the following process occurs:

Client-Side Processing

Images are processed entirely in the browser:
client/src/hooks/useImageUpload.ts
import { useImageStore } from '@/store/imageStore';

export function useImageUpload() {
  const { addImage } = useImageStore();

  const handleImageUpload = async (files: File[]) => {
    // Maximum 4 images
    if (files.length > 4) {
      throw new Error('Maximum 4 images allowed');
    }

    // Convert to base64
    for (const file of files) {
      const reader = new FileReader();
      reader.onload = (e) => {
        const base64 = e.target?.result as string;
        addImage({
          id: Math.random().toString(36),
          data: base64.split(',')[1],
          mimeType: file.type
        });
      };
      reader.readAsDataURL(file);
    }
  };

  return { handleImageUpload };
}

Server-Side Integration

The server includes images in the chat history when creating a Gemini session:
server/routes.ts (Line 398-423)
if (user_images && user_images.length > 0) {
  chat = model.startChat({
    tools: [{ google_search: {} }],
    history: [
      {
        role: "user",
        parts: [
          ...user_images.map((img: UserImage) => ({
            inlineData: {
              data: img.data,
              mimeType: img.mimeType
            }
          })),
          { text: 'Use uploaded images to search for information' }
        ]
      }
    ]
  });
}

Limitations

  • Maximum 4 images per search
  • Each image must be under 10MB
  • Supported formats: JPEG, PNG, GIF, WebP
  • Images are not persisted server-side

Best Practices

Clear Images

Use high-quality, well-lit images for best results

Relevant Context

Ensure images directly relate to your text query

Descriptive Queries

Combine images with clear, descriptive text

Appropriate Size

Resize large images to improve upload speed

Privacy & Security

Images are processed in real-time and are not stored on the server. All image data is:
  • Transmitted securely via HTTPS
  • Processed only for the current search session
  • Discarded after the response is generated
  • Not logged or saved to disk

Troubleshooting

You can only upload 4 images at a time. Remove some images and try again.
Ensure your files are JPEG, PNG, GIF, or WebP format. Other file types are not supported.
Check that your query references the images. Try rephrasing like “Based on this image…” or “What is shown in this photo?”
Large images take longer to upload. Consider resizing images to under 2MB for faster uploads.

Search Modes

Choose the right mode for your multimodal search

API Reference

Complete API documentation for image uploads

Build docs developers (and LLMs) love