Hugging Face Integration

Overview

The Hugging Face integration enables CVAT users to leverage state-of-the-art computer vision models from the Hugging Face Hub for automatic annotation. This integration provides access to thousands of pre-trained models for object detection, segmentation, and other computer vision tasks.

The Hugging Face integration is available for CVAT Cloud users and Enterprise self-hosted installations. It is not available for community self-hosted deployments.

What is Hugging Face?

Hugging Face is the leading platform for machine learning models and datasets, offering:

Hugging Face Hub: Repository of 100,000+ models
Transformers Library: State-of-the-art NLP and CV models
Inference API: Easy model deployment
Model fine-tuning: Tools for training custom models

Prerequisites

CVAT Cloud account or Enterprise self-hosted installation
Hugging Face account (free or paid)
Hugging Face API token (for private models)
A CVAT task or project with labels defined

Supported Model Architectures

The integration supports various computer vision model architectures from Hugging Face:

Object Detection Models

DETR (DEtection TRansformer): Facebook’s transformer-based detector
YOLOv5/YOLOv8: Fast and accurate object detection
Faster R-CNN: Region-based convolutional neural networks
RetinaNet: Single-stage detector with focal loss

Segmentation Models

Mask R-CNN: Instance segmentation
Segment Anything (SAM): Universal segmentation model
Semantic Segmentation Models: SegFormer, UperNet, etc.

Transformers-Based Models

Vision Transformers (ViT): Image classification
CLIP: Vision-language models
DINOv2: Self-supervised vision features

Adding a Hugging Face Model

Follow these steps to integrate a Hugging Face model into CVAT:

Step 1: Find a Model on Hugging Face Hub

Visit Hugging Face Model Hub
Filter by task type:
- Object Detection
- Image Segmentation
- Image Classification
Select a model that matches your annotation needs
Note the model ID (e.g., facebook/detr-resnet-50)

Step 2: Get Your API Token

Go to Hugging Face Settings
Click New token
Give it a name and select permissions
Copy the generated token

Keep your API token secure. Never share it publicly or commit it to version control.

Step 3: Add the Model in CVAT

In CVAT, navigate to the Models page
Click Add model
Select Hugging Face as the model source
Enter the following information:
- Model name: Descriptive name for your reference
- Model ID: The Hugging Face model identifier (e.g., facebook/detr-resnet-50)
- API token: Your Hugging Face API token (for private models)
Click Add to save the model

Using Hugging Face Models for Automatic Annotation

Once configured, you can use Hugging Face models for automatic annotation:

Running Automatic Annotation

Open your task in CVAT
Click Actions > Automatic annotation
Select your Hugging Face model from the dropdown
Configure settings:
- Threshold: Confidence threshold (0.0-1.0)
- Clean old annotations: Remove existing annotations
- Return masks as polygons: Convert masks to polygons
Map model labels to task labels
Click Annotate

The annotation process will:

Send images to Hugging Face Inference API
Process predictions
Create annotations in your task
Show progress in real-time

Example: Using DETR for Object Detection

Here’s an example workflow using the DETR model:

# Example configuration for DETR model
model_config = {
    "name": "DETR Object Detection",
    "model_id": "facebook/detr-resnet-50",
    "task": "object-detection",
    "threshold": 0.7
}

# Label mapping
label_mapping = {
    "person": "person",
    "car": "vehicle",
    "truck": "vehicle",
    "bicycle": "bike",
    "motorcycle": "bike"
}

Using the CVAT Python SDK

from cvat_sdk import make_client

# Connect to CVAT
client = make_client(
    host="https://app.cvat.ai",
    credentials=("username", "password")
)

# Get task
task = client.tasks.retrieve(123)

# Run automatic annotation with Hugging Face model
task.annotate(
    model_name="detr-resnet-50",
    mapping=label_mapping,
    threshold=0.7,
    clear_existing=False
)

print(f"Annotation complete for task {task.id}")

Model Performance Optimization

Choosing the Right Model

Consider these factors when selecting a model:

Accuracy vs Speed: Larger models (ResNet-101) are more accurate but slower
Domain Similarity: Choose models trained on similar data
Label Coverage: Ensure the model supports your required labels
Model Size: Consider API latency for large models

Optimizing Threshold

Adjust the confidence threshold based on your needs:

High Precision (0.8-0.95): Fewer false positives, may miss objects
Balanced (0.5-0.7): Good trade-off between precision and recall
High Recall (0.3-0.5): Catch more objects, more false positives

Run test batches with different thresholds to find the optimal value.

Handling Large Datasets

For large annotation tasks:

Process in batches to avoid timeouts
Use lower resolution images if possible
Consider using multiple models for different object types
Monitor API rate limits

Advanced Configuration

Custom Model Parameters

Some models support additional parameters:

{
  "model_id": "facebook/detr-resnet-50",
  "parameters": {
    "threshold": 0.7,
    "max_detections": 100,
    "nms_threshold": 0.5
  }
}

Using Fine-Tuned Models

You can use your own fine-tuned models from Hugging Face:

Train and upload your model to Hugging Face Hub
Make the model public or use your API token
Add the model to CVAT using your model ID
Configure label mappings for your custom classes

Troubleshooting

Model Not Loading

Issue: Model fails to load in CVAT Solutions:

Verify the model ID is correct
Check that your API token has proper permissions
Ensure the model supports the required task type
Try using a different model version

Slow Inference

Issue: Automatic annotation is taking too long Solutions:

Use a smaller/faster model architecture
Reduce image resolution if possible
Process fewer images at a time
Check Hugging Face API status

Incorrect Predictions

Issue: Model predictions are inaccurate Solutions:

Adjust the confidence threshold
Try a model trained on more similar data
Consider fine-tuning the model on your data
Review and manually correct predictions

API Rate Limits

Hugging Face enforces API rate limits:

Free tier: Limited requests per hour
PRO tier: Higher limits and faster inference
Enterprise: Unlimited with dedicated infrastructure

If you hit rate limits:

Wait for the limit to reset
Upgrade your Hugging Face plan
Use batch processing with delays

Model Recommendations by Use Case

Use Case	Recommended Models	Notes
General Object Detection	`facebook/detr-resnet-50`	Good balance of speed and accuracy
High-Accuracy Detection	`facebook/detr-resnet-101`	Slower but more accurate
Fast Detection	`hustvl/yolos-tiny`	Lower accuracy, very fast
Instance Segmentation	`facebook/mask2former-swin-base`	High-quality masks
Semantic Segmentation	`nvidia/segformer-b5-finetuned-ade`	Dense pixel-level labeling
Face Detection	`Bingsu/RetinaFace`	Specialized for faces

Get Started

Annotation

Projects & Tasks

Dataset Management

Integrations

Account & Organization

Hugging Face Integration

Overview

What is Hugging Face?

Prerequisites

Supported Model Architectures

Object Detection Models

Segmentation Models

Transformers-Based Models

Adding a Hugging Face Model

Step 1: Find a Model on Hugging Face Hub

Step 2: Get Your API Token

Step 3: Add the Model in CVAT

Using Hugging Face Models for Automatic Annotation

Running Automatic Annotation

Example: Using DETR for Object Detection

Using the CVAT Python SDK

Model Performance Optimization

Advanced Configuration

Custom Model Parameters

Using Fine-Tuned Models

Troubleshooting

Model Not Loading

Slow Inference

Incorrect Predictions

API Rate Limits

Model Recommendations by Use Case

Additional Resources

Build docs developers (and LLMs) love

Get Started

Annotation

Projects & Tasks

Dataset Management

Integrations

Account & Organization

​Overview

​What is Hugging Face?

​Prerequisites

​Supported Model Architectures

​Object Detection Models

​Segmentation Models

​Transformers-Based Models

​Adding a Hugging Face Model

​Step 1: Find a Model on Hugging Face Hub

​Step 2: Get Your API Token

​Step 3: Add the Model in CVAT

​Using Hugging Face Models for Automatic Annotation

​Running Automatic Annotation

​Example: Using DETR for Object Detection

​Using the CVAT Python SDK

​Model Performance Optimization

​Advanced Configuration

​Custom Model Parameters

​Using Fine-Tuned Models

​Troubleshooting

​Model Not Loading

​Slow Inference

​Incorrect Predictions

​API Rate Limits

​Model Recommendations by Use Case

​Additional Resources

Build docs developers (and LLMs) love

Overview

What is Hugging Face?

Prerequisites

Supported Model Architectures

Object Detection Models

Segmentation Models

Transformers-Based Models

Adding a Hugging Face Model

Step 1: Find a Model on Hugging Face Hub

Step 2: Get Your API Token

Step 3: Add the Model in CVAT

Using Hugging Face Models for Automatic Annotation

Running Automatic Annotation

Example: Using DETR for Object Detection

Using the CVAT Python SDK

Model Performance Optimization

Advanced Configuration

Custom Model Parameters

Using Fine-Tuned Models

Troubleshooting

Model Not Loading

Slow Inference

Incorrect Predictions

API Rate Limits

Model Recommendations by Use Case

Additional Resources