Overview
The Hugging Face integration enables CVAT users to leverage state-of-the-art computer vision models from the Hugging Face Hub for automatic annotation. This integration provides access to thousands of pre-trained models for object detection, segmentation, and other computer vision tasks.The Hugging Face integration is available for CVAT Cloud users and Enterprise self-hosted installations. It is not available for community self-hosted deployments.
What is Hugging Face?
Hugging Face is the leading platform for machine learning models and datasets, offering:- Hugging Face Hub: Repository of 100,000+ models
- Transformers Library: State-of-the-art NLP and CV models
- Inference API: Easy model deployment
- Model fine-tuning: Tools for training custom models
Prerequisites
- CVAT Cloud account or Enterprise self-hosted installation
- Hugging Face account (free or paid)
- Hugging Face API token (for private models)
- A CVAT task or project with labels defined
Supported Model Architectures
The integration supports various computer vision model architectures from Hugging Face:Object Detection Models
- DETR (DEtection TRansformer): Facebook’s transformer-based detector
- YOLOv5/YOLOv8: Fast and accurate object detection
- Faster R-CNN: Region-based convolutional neural networks
- RetinaNet: Single-stage detector with focal loss
Segmentation Models
- Mask R-CNN: Instance segmentation
- Segment Anything (SAM): Universal segmentation model
- Semantic Segmentation Models: SegFormer, UperNet, etc.
Transformers-Based Models
- Vision Transformers (ViT): Image classification
- CLIP: Vision-language models
- DINOv2: Self-supervised vision features
Adding a Hugging Face Model
Follow these steps to integrate a Hugging Face model into CVAT:Step 1: Find a Model on Hugging Face Hub
- Visit Hugging Face Model Hub
- Filter by task type:
- Object Detection
- Image Segmentation
- Image Classification
- Select a model that matches your annotation needs
- Note the model ID (e.g.,
facebook/detr-resnet-50)
Step 2: Get Your API Token
- Go to Hugging Face Settings
- Click New token
- Give it a name and select permissions
- Copy the generated token
Step 3: Add the Model in CVAT
- In CVAT, navigate to the Models page
- Click Add model
- Select Hugging Face as the model source
- Enter the following information:
- Model name: Descriptive name for your reference
- Model ID: The Hugging Face model identifier (e.g.,
facebook/detr-resnet-50) - API token: Your Hugging Face API token (for private models)
- Click Add to save the model
Using Hugging Face Models for Automatic Annotation
Once configured, you can use Hugging Face models for automatic annotation:Running Automatic Annotation
- Open your task in CVAT
- Click Actions > Automatic annotation
- Select your Hugging Face model from the dropdown
- Configure settings:
- Threshold: Confidence threshold (0.0-1.0)
- Clean old annotations: Remove existing annotations
- Return masks as polygons: Convert masks to polygons
- Map model labels to task labels
- Click Annotate
- Send images to Hugging Face Inference API
- Process predictions
- Create annotations in your task
- Show progress in real-time
Example: Using DETR for Object Detection
Here’s an example workflow using the DETR model:Using the CVAT Python SDK
Model Performance Optimization
Choosing the Right Model
Choosing the Right Model
Consider these factors when selecting a model:
- Accuracy vs Speed: Larger models (ResNet-101) are more accurate but slower
- Domain Similarity: Choose models trained on similar data
- Label Coverage: Ensure the model supports your required labels
- Model Size: Consider API latency for large models
Optimizing Threshold
Optimizing Threshold
Adjust the confidence threshold based on your needs:
- High Precision (0.8-0.95): Fewer false positives, may miss objects
- Balanced (0.5-0.7): Good trade-off between precision and recall
- High Recall (0.3-0.5): Catch more objects, more false positives
Handling Large Datasets
Handling Large Datasets
For large annotation tasks:
- Process in batches to avoid timeouts
- Use lower resolution images if possible
- Consider using multiple models for different object types
- Monitor API rate limits
Advanced Configuration
Custom Model Parameters
Some models support additional parameters:Using Fine-Tuned Models
You can use your own fine-tuned models from Hugging Face:- Train and upload your model to Hugging Face Hub
- Make the model public or use your API token
- Add the model to CVAT using your model ID
- Configure label mappings for your custom classes
Troubleshooting
Model Not Loading
Issue: Model fails to load in CVAT Solutions:- Verify the model ID is correct
- Check that your API token has proper permissions
- Ensure the model supports the required task type
- Try using a different model version
Slow Inference
Issue: Automatic annotation is taking too long Solutions:- Use a smaller/faster model architecture
- Reduce image resolution if possible
- Process fewer images at a time
- Check Hugging Face API status
Incorrect Predictions
Issue: Model predictions are inaccurate Solutions:- Adjust the confidence threshold
- Try a model trained on more similar data
- Consider fine-tuning the model on your data
- Review and manually correct predictions
API Rate Limits
Hugging Face enforces API rate limits:- Free tier: Limited requests per hour
- PRO tier: Higher limits and faster inference
- Enterprise: Unlimited with dedicated infrastructure
- Wait for the limit to reset
- Upgrade your Hugging Face plan
- Use batch processing with delays
Model Recommendations by Use Case
| Use Case | Recommended Models | Notes |
|---|---|---|
| General Object Detection | facebook/detr-resnet-50 | Good balance of speed and accuracy |
| High-Accuracy Detection | facebook/detr-resnet-101 | Slower but more accurate |
| Fast Detection | hustvl/yolos-tiny | Lower accuracy, very fast |
| Instance Segmentation | facebook/mask2former-swin-base | High-quality masks |
| Semantic Segmentation | nvidia/segformer-b5-finetuned-ade | Dense pixel-level labeling |
| Face Detection | Bingsu/RetinaFace | Specialized for faces |