Skip to main content

Computer Vision (Image Analysis)

Azure Computer Vision provides AI algorithms for processing images and extracting visual information. The Image Analysis service uses pre-trained models to analyze images and return insights about visual features and characteristics.

Key Capabilities

The Image Analysis API provides comprehensive image understanding features:

OCR (Read Text)

Extract printed and handwritten text from images with high accuracy

Object Detection

Detect and locate objects in images with bounding boxes and confidence scores

Image Captioning

Generate human-readable descriptions of image content in complete sentences

People Detection

Detect people in images and return bounding box coordinates for each person

Visual Tagging

Identify and tag thousands of recognizable objects, living things, scenery, and actions

Smart Cropping

Determine the area of interest in images to create optimal thumbnails

Image Analysis Features

Read Text from Images (OCR)

Version 4.0 offers synchronous OCR capabilities that extract text from images:
  • Extract printed and handwritten text
  • Support for multiple languages
  • High accuracy text recognition
  • Faster performance than async Read API
  • Returns text with bounding box coordinates
from azure.ai.vision.imageanalysis import ImageAnalysisClient
from azure.core.credentials import AzureKeyCredential

client = ImageAnalysisClient(
    endpoint="https://<your-resource>.cognitiveservices.azure.com/",
    credential=AzureKeyCredential("<your-key>")
)

result = client.analyze_from_url(
    image_url="https://example.com/image.jpg",
    visual_features=["READ"]
)

for block in result.read.blocks:
    for line in block.lines:
        print(f"Text: {line.text}")

Generate Image Captions

Create human-readable descriptions of images:
  • Simple captions: One-sentence descriptions of the entire image
  • Dense captions: Detailed captions for individual objects with bounding boxes
  • Natural language descriptions
  • High accuracy based on image content
result = client.analyze_from_url(
    image_url="https://example.com/image.jpg",
    visual_features=["CAPTION"]
)

print(f"Caption: {result.caption.text}")
print(f"Confidence: {result.caption.confidence}")

Detect Objects

Identify objects in images with bounding boxes:
  • Detect multiple instances of the same object
  • Return pixel coordinates for each object
  • Confidence scores for each detection
  • Support for thousands of object categories
result = client.analyze_from_url(
    image_url="https://example.com/image.jpg",
    visual_features=["OBJECTS"]
)

for obj in result.objects:
    print(f"Object: {obj.tags[0].name}")
    print(f"Confidence: {obj.tags[0].confidence}")
    print(f"Bounding box: {obj.bounding_box}")

People Detection

Detect people appearing in images (v4.0 only):
  • Returns bounding box coordinates for each person
  • Confidence scores for each detection
  • Works with single or multiple people
result = client.analyze_from_url(
    image_url="https://example.com/image.jpg",
    visual_features=["PEOPLE"]
)

for person in result.people:
    print(f"Person detected at: {person.bounding_box}")
    print(f"Confidence: {person.confidence}")

Tag Visual Features

Identify and tag visual content:
  • Thousands of recognizable objects, living things, scenery, and actions
  • Tags with confidence scores
  • Context hints for ambiguous tags
  • Includes both main subjects and background elements
result = client.analyze_from_url(
    image_url="https://example.com/image.jpg",
    visual_features=["TAGS"]
)

for tag in result.tags:
    print(f"{tag.name}: {tag.confidence}")

Smart Cropping (Area of Interest)

Find the optimal region of interest for thumbnails:
  • Analyzes image content to determine focus area
  • Returns bounding box coordinates
  • Supports custom aspect ratios
  • Preserves the most important visual elements
result = client.analyze_from_url(
    image_url="https://example.com/image.jpg",
    visual_features=["SMART_CROPS"],
    smart_crops_aspect_ratios=[0.9, 1.33]
)

for crop in result.smart_crops:
    print(f"Crop box: {crop.bounding_box}")

Additional Features (v3.2)

Version 3.2 includes these additional capabilities:
  • Brand Detection: Identify commercial brands and logos
  • Face Detection: Detect faces and estimate age and gender
  • Image Type Detection: Determine if image is clip art or line drawing
  • Color Scheme Detection: Identify dominant and accent colors
  • Adult Content Detection: Detect adult, racy, or gory content
  • Domain-Specific Models: Detect celebrities and landmarks
  • Image Categorization: Categorize images using a taxonomy

Multimodal Embeddings

Convert images and text to vector representations for semantic search:
  • Vectorize images for similarity search
  • Convert text queries to vectors
  • Match images to text based on semantic meaning
  • Support for 102 languages (multilingual model)
  • Build image search applications
from azure.ai.vision.imageanalysis import VectorizeClient

vectorize_client = VectorizeClient(
    endpoint="https://<your-resource>.cognitiveservices.azure.com/",
    credential=AzureKeyCredential("<your-key>")
)

# Vectorize an image
image_vector = vectorize_client.vectorize_image(
    image_url="https://example.com/image.jpg"
)

# Vectorize a text query
text_vector = vectorize_client.vectorize_text(
    text="a dog playing in the park"
)

API Versions

  • Synchronous OCR (Read)
  • People detection
  • Dense captions
  • Enhanced image captioning
  • Improved smart cropping
  • Multimodal embeddings

Version 3.2

  • Async OCR (Read API)
  • Brand detection
  • Face detection
  • Celebrity and landmark detection
  • All other v3.2 features

Input Requirements

Version 4.0:
  • Supported formats: JPEG, PNG, GIF, BMP, WEBP, ICO, TIFF, MPO
  • File size: Less than 20 MB
  • Dimensions: 50 x 50 to 16,000 x 16,000 pixels
Version 3.2:
  • Supported formats: JPEG, PNG, GIF, BMP
  • File size: Less than 4 MB
  • Dimensions: 50 x 50 to 16,000 x 16,000 pixels

Region Availability

RegionAnalyze ImageCaptions (v4.0)Embeddings
East US
West US
West US 2
North Europe
West Europe
Southeast Asia

Use Cases

  • Detect inappropriate images
  • Filter adult content
  • Ensure brand safety
  • Monitor user-generated content
  • Generate product descriptions
  • Tag products automatically
  • Create smart thumbnails
  • Enable visual search
  • Generate alt text for images
  • Read text aloud from images
  • Describe visual content
  • Support screen readers
  • Extract text from scanned documents
  • Digitize printed materials
  • Process forms and receipts
  • Archive historical documents

Getting Started

1

Create Computer Vision Resource

Create an Azure Computer Vision resource in the Azure Portal
2

Get Credentials

Retrieve your endpoint URL and API key from the resource
3

Install SDK

Install the Computer Vision SDK for your preferred language:
pip install azure-ai-vision-imageanalysis
4

Analyze Images

Use the SDK to analyze images and extract features

SDK Support

Python

pip install azure-ai-vision-imageanalysis

C#

dotnet add package Azure.AI.Vision.ImageAnalysis

Java

<dependency>
  <groupId>com.azure</groupId>
  <artifactId>azure-ai-vision-imageanalysis</artifactId>
</dependency>

JavaScript

npm install @azure-rest/ai-vision-image-analysis

Pricing

  • Free Tier (F0): 5,000 transactions per month
  • Standard Tier (S1): Pay per 1,000 transactions
  • Different pricing for v3.2 and v4.0 features
  • Additional costs for custom models

Next Steps

Build docs developers (and LLMs) love