Overview
FiftyOne is an open-source tool for dataset visualization, exploration, and curation that integrates seamlessly with CVAT. This integration creates a powerful workflow for managing computer vision datasets, combining FiftyOne’s advanced analytics with CVAT’s annotation capabilities.
The FiftyOne integration is available for both CVAT Cloud and self-hosted installations.
What is FiftyOne?
FiftyOne is an open-source dataset curation and model analysis tool that provides:
Visual dataset exploration : Interactive browser-based dataset visualization
Dataset quality analysis : Identify issues, outliers, and edge cases
Model evaluation : Analyze model predictions and errors
Label refinement : Send samples to CVAT for annotation or correction
Embeddings visualization : Understand dataset structure and diversity
Prerequisites
Python 3.7 or higher
FiftyOne installed (pip install fiftyone)
CVAT account (Cloud or self-hosted)
CVAT API credentials
Installation
Install FiftyOne with CVAT integration support:
# Install FiftyOne
pip install fiftyone
# Install CVAT SDK (required for integration)
pip install cvat-sdk
Verify the installation:
import fiftyone as fo
import fiftyone.zoo as foz
print (fo. __version__ )
Connecting FiftyOne to CVAT
Configure FiftyOne to connect to your CVAT instance:
For CVAT Cloud
import fiftyone as fo
from fiftyone.utils.cvat import CVATBackendConfig
# Configure CVAT connection
config = CVATBackendConfig(
url = "https://app.cvat.ai" ,
username = "your-username" ,
password = "your-password"
)
For Self-Hosted CVAT
config = CVATBackendConfig(
url = "https://your-cvat-instance.com" ,
username = "your-username" ,
password = "your-password"
)
Never hardcode credentials in your scripts. Use environment variables or a secure configuration file.
Using Environment Variables
export FIFTYONE_CVAT_URL = "https://app.cvat.ai"
export FIFTYONE_CVAT_USERNAME = "your-username"
export FIFTYONE_CVAT_PASSWORD = "your-password"
Then in Python:
import os
import fiftyone as fo
from fiftyone.utils.cvat import CVATBackendConfig
config = CVATBackendConfig(
url = os.getenv( "FIFTYONE_CVAT_URL" ),
username = os.getenv( "FIFTYONE_CVAT_USERNAME" ),
password = os.getenv( "FIFTYONE_CVAT_PASSWORD" )
)
Workflow: FiftyOne to CVAT
1. Load and Explore Dataset in FiftyOne
Start by loading a dataset into FiftyOne:
import fiftyone as fo
import fiftyone.zoo as foz
# Load a dataset (example using COCO)
dataset = foz.load_zoo_dataset(
"coco-2017" ,
split = "validation" ,
max_samples = 100
)
# Launch FiftyOne App to explore
session = fo.launch_app(dataset)
2. Select Samples for Annotation
Use FiftyOne’s query capabilities to select samples:
# Select samples that need annotation
from fiftyone import ViewField as F
# Example: Select images without annotations
view = dataset.match(F( "ground_truth.detections" ).length() == 0 )
# Example: Select images with low confidence predictions
view = dataset.match(
F( "predictions.detections.confidence" ).max() < 0.7
)
# Example: Random sample for quality control
view = dataset.take( 50 )
3. Send Samples to CVAT
Export selected samples to CVAT for annotation:
import fiftyone.utils.cvat as fouc
# Define label schema
label_schema = {
"ground_truth" : {
"type" : "detections" ,
"classes" : [ "person" , "car" , "bicycle" , "dog" , "cat" ]
}
}
# Upload to CVAT
results = view.annotate(
"cvat" ,
label_schema = label_schema,
label_field = "ground_truth" ,
task_name = "Dataset Annotation - Batch 1" ,
task_size = 10 , # Samples per task
segment_size = 1 , # Images per job
backend_config = config
)
print ( f "Created CVAT task: { results.task_id } " )
4. Annotate in CVAT
Annotators can now work on the task in CVAT using all available features:
Manual annotation tools
Automatic annotation with AI models
Quality control and review
Collaborative annotation
5. Import Annotations Back to FiftyOne
Once annotation is complete, import the results:
# Load annotations from CVAT
results.load_annotations()
print ( f "Loaded { len (view) } annotated samples" )
# Refresh the FiftyOne App to see updates
session.refresh()
Workflow: CVAT to FiftyOne
You can also import existing CVAT projects into FiftyOne:
Import CVAT Project
import fiftyone as fo
from fiftyone.utils.cvat import CVATBackendConfig, import_annotations
# Configure connection
config = CVATBackendConfig(
url = "https://app.cvat.ai" ,
username = "your-username" ,
password = "your-password"
)
# Create a FiftyOne dataset from CVAT task
task_id = 12345
dataset = fo.Dataset.from_dir(
dataset_type = fo.types.CVATImageDataset,
data_path = "/path/to/images" ,
labels_path = f "cvat://task/ { task_id } " ,
backend = config
)
print (dataset)
Download CVAT Annotations
# Download annotations for offline analysis
from cvat_sdk import make_client
client = make_client(
host = "https://app.cvat.ai" ,
credentials = ( "username" , "password" )
)
# Download task annotations
task = client.tasks.retrieve( 12345 )
task.export_dataset( "COCO 1.0" , "annotations.zip" )
# Load into FiftyOne
dataset = fo.Dataset.from_dir(
dataset_type = fo.types.COCODetectionDataset,
data_path = "images/" ,
labels_path = "annotations.json"
)
Advanced Use Cases
Dataset Quality Control
Use FiftyOne to identify annotation quality issues:
import fiftyone as fo
import fiftyone.brain as fob
# Load annotated dataset
dataset = fo.load_dataset( "my_cvat_dataset" )
# Compute uniqueness (find duplicates)
fob.compute_uniqueness(dataset)
# Find potential duplicates
duplicates_view = dataset.sort_by( "uniqueness" ).limit( 100 )
# Visualize
session = fo.launch_app(duplicates_view)
# Send duplicates back to CVAT for review
duplicates_view.annotate(
"cvat" ,
label_field = "ground_truth" ,
task_name = "Quality Control - Duplicates" ,
backend_config = config
)
Active Learning Pipeline
Implement an active learning workflow:
import fiftyone as fo
import fiftyone.brain as fob
# 1. Train model on initial dataset
# (model training code here)
# 2. Run inference on unlabeled data
dataset.apply_model(model, label_field = "predictions" )
# 3. Compute hardness scores
fob.compute_hardness(dataset, "predictions" )
# 4. Select hard examples for annotation
hard_samples = dataset.sort_by( "hardness" , reverse = True ).limit( 100 )
# 5. Send to CVAT for labeling
hard_samples.annotate(
"cvat" ,
label_field = "ground_truth" ,
task_name = "Active Learning - Round 1" ,
backend_config = config
)
# 6. Import labels and retrain
# (repeat the cycle)
Model Evaluation with CVAT Refinement
import fiftyone as fo
import fiftyone.brain as fob
from fiftyone import ViewField as F
# Load predictions and ground truth
dataset = fo.load_dataset( "model_evaluation" )
# Compute evaluation metrics
results = dataset.evaluate_detections(
"predictions" ,
gt_field = "ground_truth" ,
eval_key = "eval"
)
# Find false positives
fp_view = dataset.match(
F( "eval_fp" ) > 0
)
# Send false positives to CVAT for label verification
fp_view.annotate(
"cvat" ,
label_field = "ground_truth" ,
task_name = "False Positive Review" ,
backend_config = config
)
print ( f "Sent { len (fp_view) } false positives for review" )
Best Practices
Explore first : Use FiftyOne to understand your data before annotating
Strategic sampling : Annotate the most valuable samples first
Batch processing : Break large datasets into manageable CVAT tasks
Regular syncing : Import annotations frequently to track progress
Task size : 50-200 images per task works well
Job segments : 10-30 images per job for efficient annotation
Label consistency : Use the same label schema across all tasks
Clear naming : Use descriptive task names with dates/batches
Use FiftyOne to visualize annotations after import
Compare multiple annotator outputs
Identify and resolve label inconsistencies
Track annotation progress with metadata
Troubleshooting
Connection Issues
Problem : Cannot connect to CVAT from FiftyOne
Solution :
# Test connection
from cvat_sdk import make_client
client = make_client(
host = "https://app.cvat.ai" ,
credentials = ( "username" , "password" )
)
print (client.api_client.configuration.host)
Label Schema Mismatch
Problem : Labels don’t match between FiftyOne and CVAT
Solution : Explicitly define label mappings:
label_mapping = {
"fiftyone_label" : "cvat_label"
}
For large datasets:
Use dataset views to work with subsets
Enable sample caching in FiftyOne
Break into multiple smaller CVAT tasks
Additional Resources