Computer Vision (Image Analysis)
Azure Computer Vision provides AI algorithms for processing images and extracting visual information. The Image Analysis service uses pre-trained models to analyze images and return insights about visual features and characteristics.Key Capabilities
The Image Analysis API provides comprehensive image understanding features:OCR (Read Text)
Extract printed and handwritten text from images with high accuracy
Object Detection
Detect and locate objects in images with bounding boxes and confidence scores
Image Captioning
Generate human-readable descriptions of image content in complete sentences
People Detection
Detect people in images and return bounding box coordinates for each person
Visual Tagging
Identify and tag thousands of recognizable objects, living things, scenery, and actions
Smart Cropping
Determine the area of interest in images to create optimal thumbnails
Image Analysis Features
Read Text from Images (OCR)
Version 4.0 offers synchronous OCR capabilities that extract text from images:- Extract printed and handwritten text
- Support for multiple languages
- High accuracy text recognition
- Faster performance than async Read API
- Returns text with bounding box coordinates
Generate Image Captions
Create human-readable descriptions of images:- Simple captions: One-sentence descriptions of the entire image
- Dense captions: Detailed captions for individual objects with bounding boxes
- Natural language descriptions
- High accuracy based on image content
Detect Objects
Identify objects in images with bounding boxes:- Detect multiple instances of the same object
- Return pixel coordinates for each object
- Confidence scores for each detection
- Support for thousands of object categories
People Detection
Detect people appearing in images (v4.0 only):- Returns bounding box coordinates for each person
- Confidence scores for each detection
- Works with single or multiple people
Tag Visual Features
Identify and tag visual content:- Thousands of recognizable objects, living things, scenery, and actions
- Tags with confidence scores
- Context hints for ambiguous tags
- Includes both main subjects and background elements
Smart Cropping (Area of Interest)
Find the optimal region of interest for thumbnails:- Analyzes image content to determine focus area
- Returns bounding box coordinates
- Supports custom aspect ratios
- Preserves the most important visual elements
Additional Features (v3.2)
Version 3.2 includes these additional capabilities:- Brand Detection: Identify commercial brands and logos
- Face Detection: Detect faces and estimate age and gender
- Image Type Detection: Determine if image is clip art or line drawing
- Color Scheme Detection: Identify dominant and accent colors
- Adult Content Detection: Detect adult, racy, or gory content
- Domain-Specific Models: Detect celebrities and landmarks
- Image Categorization: Categorize images using a taxonomy
Multimodal Embeddings
Convert images and text to vector representations for semantic search:- Vectorize images for similarity search
- Convert text queries to vectors
- Match images to text based on semantic meaning
- Support for 102 languages (multilingual model)
- Build image search applications
API Versions
Version 4.0 (Recommended)
- Synchronous OCR (Read)
- People detection
- Dense captions
- Enhanced image captioning
- Improved smart cropping
- Multimodal embeddings
Version 3.2
- Async OCR (Read API)
- Brand detection
- Face detection
- Celebrity and landmark detection
- All other v3.2 features
Input Requirements
Version 4.0:- Supported formats: JPEG, PNG, GIF, BMP, WEBP, ICO, TIFF, MPO
- File size: Less than 20 MB
- Dimensions: 50 x 50 to 16,000 x 16,000 pixels
- Supported formats: JPEG, PNG, GIF, BMP
- File size: Less than 4 MB
- Dimensions: 50 x 50 to 16,000 x 16,000 pixels
Region Availability
| Region | Analyze Image | Captions (v4.0) | Embeddings |
|---|---|---|---|
| East US | ✓ | ✓ | ✓ |
| West US | ✓ | ✓ | ✓ |
| West US 2 | ✓ | ✓ | |
| North Europe | ✓ | ✓ | ✓ |
| West Europe | ✓ | ✓ | ✓ |
| Southeast Asia | ✓ | ✓ | ✓ |
Use Cases
Content Moderation
Content Moderation
- Detect inappropriate images
- Filter adult content
- Ensure brand safety
- Monitor user-generated content
E-commerce
E-commerce
- Generate product descriptions
- Tag products automatically
- Create smart thumbnails
- Enable visual search
Accessibility
Accessibility
- Generate alt text for images
- Read text aloud from images
- Describe visual content
- Support screen readers
Document Processing
Document Processing
- Extract text from scanned documents
- Digitize printed materials
- Process forms and receipts
- Archive historical documents
Getting Started
SDK Support
Python
C#
Java
JavaScript
Pricing
- Free Tier (F0): 5,000 transactions per month
- Standard Tier (S1): Pay per 1,000 transactions
- Different pricing for v3.2 and v4.0 features
- Additional costs for custom models