Skip to main content

Overview

FiftyOne is an open-source tool for dataset visualization, exploration, and curation that integrates seamlessly with CVAT. This integration creates a powerful workflow for managing computer vision datasets, combining FiftyOne’s advanced analytics with CVAT’s annotation capabilities.
The FiftyOne integration is available for both CVAT Cloud and self-hosted installations.

What is FiftyOne?

FiftyOne is an open-source dataset curation and model analysis tool that provides:
  • Visual dataset exploration: Interactive browser-based dataset visualization
  • Dataset quality analysis: Identify issues, outliers, and edge cases
  • Model evaluation: Analyze model predictions and errors
  • Label refinement: Send samples to CVAT for annotation or correction
  • Embeddings visualization: Understand dataset structure and diversity

Prerequisites

  • Python 3.7 or higher
  • FiftyOne installed (pip install fiftyone)
  • CVAT account (Cloud or self-hosted)
  • CVAT API credentials

Installation

Install FiftyOne with CVAT integration support:
# Install FiftyOne
pip install fiftyone

# Install CVAT SDK (required for integration)
pip install cvat-sdk
Verify the installation:
import fiftyone as fo
import fiftyone.zoo as foz

print(fo.__version__)

Connecting FiftyOne to CVAT

Configure FiftyOne to connect to your CVAT instance:

For CVAT Cloud

import fiftyone as fo
from fiftyone.utils.cvat import CVATBackendConfig

# Configure CVAT connection
config = CVATBackendConfig(
    url="https://app.cvat.ai",
    username="your-username",
    password="your-password"
)

For Self-Hosted CVAT

config = CVATBackendConfig(
    url="https://your-cvat-instance.com",
    username="your-username",
    password="your-password"
)
Never hardcode credentials in your scripts. Use environment variables or a secure configuration file.

Using Environment Variables

export FIFTYONE_CVAT_URL="https://app.cvat.ai"
export FIFTYONE_CVAT_USERNAME="your-username"
export FIFTYONE_CVAT_PASSWORD="your-password"
Then in Python:
import os
import fiftyone as fo
from fiftyone.utils.cvat import CVATBackendConfig

config = CVATBackendConfig(
    url=os.getenv("FIFTYONE_CVAT_URL"),
    username=os.getenv("FIFTYONE_CVAT_USERNAME"),
    password=os.getenv("FIFTYONE_CVAT_PASSWORD")
)

Workflow: FiftyOne to CVAT

1. Load and Explore Dataset in FiftyOne

Start by loading a dataset into FiftyOne:
import fiftyone as fo
import fiftyone.zoo as foz

# Load a dataset (example using COCO)
dataset = foz.load_zoo_dataset(
    "coco-2017",
    split="validation",
    max_samples=100
)

# Launch FiftyOne App to explore
session = fo.launch_app(dataset)

2. Select Samples for Annotation

Use FiftyOne’s query capabilities to select samples:
# Select samples that need annotation
from fiftyone import ViewField as F

# Example: Select images without annotations
view = dataset.match(F("ground_truth.detections").length() == 0)

# Example: Select images with low confidence predictions
view = dataset.match(
    F("predictions.detections.confidence").max() < 0.7
)

# Example: Random sample for quality control
view = dataset.take(50)

3. Send Samples to CVAT

Export selected samples to CVAT for annotation:
import fiftyone.utils.cvat as fouc

# Define label schema
label_schema = {
    "ground_truth": {
        "type": "detections",
        "classes": ["person", "car", "bicycle", "dog", "cat"]
    }
}

# Upload to CVAT
results = view.annotate(
    "cvat",
    label_schema=label_schema,
    label_field="ground_truth",
    task_name="Dataset Annotation - Batch 1",
    task_size=10,  # Samples per task
    segment_size=1,  # Images per job
    backend_config=config
)

print(f"Created CVAT task: {results.task_id}")

4. Annotate in CVAT

Annotators can now work on the task in CVAT using all available features:
  • Manual annotation tools
  • Automatic annotation with AI models
  • Quality control and review
  • Collaborative annotation

5. Import Annotations Back to FiftyOne

Once annotation is complete, import the results:
# Load annotations from CVAT
results.load_annotations()

print(f"Loaded {len(view)} annotated samples")

# Refresh the FiftyOne App to see updates
session.refresh()

Workflow: CVAT to FiftyOne

You can also import existing CVAT projects into FiftyOne:

Import CVAT Project

import fiftyone as fo
from fiftyone.utils.cvat import CVATBackendConfig, import_annotations

# Configure connection
config = CVATBackendConfig(
    url="https://app.cvat.ai",
    username="your-username",
    password="your-password"
)

# Create a FiftyOne dataset from CVAT task
task_id = 12345

dataset = fo.Dataset.from_dir(
    dataset_type=fo.types.CVATImageDataset,
    data_path="/path/to/images",
    labels_path=f"cvat://task/{task_id}",
    backend=config
)

print(dataset)

Download CVAT Annotations

# Download annotations for offline analysis
from cvat_sdk import make_client

client = make_client(
    host="https://app.cvat.ai",
    credentials=("username", "password")
)

# Download task annotations
task = client.tasks.retrieve(12345)
task.export_dataset("COCO 1.0", "annotations.zip")

# Load into FiftyOne
dataset = fo.Dataset.from_dir(
    dataset_type=fo.types.COCODetectionDataset,
    data_path="images/",
    labels_path="annotations.json"
)

Advanced Use Cases

Dataset Quality Control

Use FiftyOne to identify annotation quality issues:
import fiftyone as fo
import fiftyone.brain as fob

# Load annotated dataset
dataset = fo.load_dataset("my_cvat_dataset")

# Compute uniqueness (find duplicates)
fob.compute_uniqueness(dataset)

# Find potential duplicates
duplicates_view = dataset.sort_by("uniqueness").limit(100)

# Visualize
session = fo.launch_app(duplicates_view)

# Send duplicates back to CVAT for review
duplicates_view.annotate(
    "cvat",
    label_field="ground_truth",
    task_name="Quality Control - Duplicates",
    backend_config=config
)

Active Learning Pipeline

Implement an active learning workflow:
import fiftyone as fo
import fiftyone.brain as fob

# 1. Train model on initial dataset
# (model training code here)

# 2. Run inference on unlabeled data
dataset.apply_model(model, label_field="predictions")

# 3. Compute hardness scores
fob.compute_hardness(dataset, "predictions")

# 4. Select hard examples for annotation
hard_samples = dataset.sort_by("hardness", reverse=True).limit(100)

# 5. Send to CVAT for labeling
hard_samples.annotate(
    "cvat",
    label_field="ground_truth",
    task_name="Active Learning - Round 1",
    backend_config=config
)

# 6. Import labels and retrain
# (repeat the cycle)

Model Evaluation with CVAT Refinement

import fiftyone as fo
import fiftyone.brain as fob
from fiftyone import ViewField as F

# Load predictions and ground truth
dataset = fo.load_dataset("model_evaluation")

# Compute evaluation metrics
results = dataset.evaluate_detections(
    "predictions",
    gt_field="ground_truth",
    eval_key="eval"
)

# Find false positives
fp_view = dataset.match(
    F("eval_fp") > 0
)

# Send false positives to CVAT for label verification
fp_view.annotate(
    "cvat",
    label_field="ground_truth",
    task_name="False Positive Review",
    backend_config=config
)

print(f"Sent {len(fp_view)} false positives for review")

Best Practices

  1. Explore first: Use FiftyOne to understand your data before annotating
  2. Strategic sampling: Annotate the most valuable samples first
  3. Batch processing: Break large datasets into manageable CVAT tasks
  4. Regular syncing: Import annotations frequently to track progress
  • Task size: 50-200 images per task works well
  • Job segments: 10-30 images per job for efficient annotation
  • Label consistency: Use the same label schema across all tasks
  • Clear naming: Use descriptive task names with dates/batches
  • Use FiftyOne to visualize annotations after import
  • Compare multiple annotator outputs
  • Identify and resolve label inconsistencies
  • Track annotation progress with metadata

Troubleshooting

Connection Issues

Problem: Cannot connect to CVAT from FiftyOne Solution:
# Test connection
from cvat_sdk import make_client

client = make_client(
    host="https://app.cvat.ai",
    credentials=("username", "password")
)
print(client.api_client.configuration.host)

Label Schema Mismatch

Problem: Labels don’t match between FiftyOne and CVAT Solution: Explicitly define label mappings:
label_mapping = {
    "fiftyone_label": "cvat_label"
}

Large Dataset Performance

For large datasets:
  • Use dataset views to work with subsets
  • Enable sample caching in FiftyOne
  • Break into multiple smaller CVAT tasks

Additional Resources

Build docs developers (and LLMs) love