Skip to main content
Deploy CVAT on Amazon Web Services (AWS) to leverage cloud infrastructure, GPU instances for serverless auto-annotation, and managed services for production deployments.

Deployment Options

CVAT can be deployed on AWS in multiple ways:
  1. EC2 with Docker Compose: Simple deployment on a single instance
  2. EC2 with GPU (P3 instances): For auto-annotation with TensorFlow models
  3. Amazon EKS: Production Kubernetes deployment
  4. Hybrid: EKS with managed AWS services (RDS, ElastiCache, EFS)

EC2 Deployment

Prerequisites

  • AWS account with appropriate permissions
  • AWS CLI installed and configured
  • SSH key pair created in your AWS region

1. Launch EC2 Instance

Instance Types:
Use CaseInstance TypevCPURAMStorage
Development/Testingt3.large28GB50GB
Small Productiont3.xlarge416GB100GB
Medium Productionm5.2xlarge832GB200GB
With GPU (Auto-annotation)p3.2xlarge861GB200GB
Launch using AWS CLI:
# Set variables
REGION=us-east-1
INSTANCE_TYPE=t3.xlarge
KEY_NAME=your-key-pair
SECURITY_GROUP=sg-xxxxxxxxx
SUBNET=subnet-xxxxxxxxx

# Launch instance
aws ec2 run-instances \
  --image-id $(aws ec2 describe-images \
    --owners 099720109477 \
    --filters "Name=name,Values=ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*" \
    --query 'sort_by(Images, &CreationDate)[-1].ImageId' \
    --output text \
    --region $REGION) \
  --instance-type $INSTANCE_TYPE \
  --key-name $KEY_NAME \
  --security-group-ids $SECURITY_GROUP \
  --subnet-id $SUBNET \
  --block-device-mappings 'DeviceName=/dev/sda1,Ebs={VolumeSize=100,VolumeType=gp3}' \
  --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=cvat-server}]' \
  --region $REGION

2. Configure Security Group

Allow incoming traffic on required ports:
# Create security group
SG_ID=$(aws ec2 create-security-group \
  --group-name cvat-sg \
  --description "CVAT security group" \
  --region $REGION \
  --output text)

# Allow SSH
aws ec2 authorize-security-group-ingress \
  --group-id $SG_ID \
  --protocol tcp \
  --port 22 \
  --cidr 0.0.0.0/0 \
  --region $REGION

# Allow HTTP
aws ec2 authorize-security-group-ingress \
  --group-id $SG_ID \
  --protocol tcp \
  --port 80 \
  --cidr 0.0.0.0/0 \
  --region $REGION

# Allow HTTPS
aws ec2 authorize-security-group-ingress \
  --group-id $SG_ID \
  --protocol tcp \
  --port 443 \
  --cidr 0.0.0.0/0 \
  --region $REGION

# Allow CVAT port (if not using 80/443)
aws ec2 authorize-security-group-ingress \
  --group-id $SG_ID \
  --protocol tcp \
  --port 8080 \
  --cidr 0.0.0.0/0 \
  --region $REGION
Using AWS Console:
  1. Navigate to EC2 → Security Groups
  2. Create security group with inbound rules:
    • SSH (22): Your IP or 0.0.0.0/0
    • HTTP (80): 0.0.0.0/0
    • HTTPS (443): 0.0.0.0/0
    • Custom TCP (8080): 0.0.0.0/0 (if needed)

3. Connect and Install Docker

# Get instance public IP
INSTANCE_IP=$(aws ec2 describe-instances \
  --filters "Name=tag:Name,Values=cvat-server" "Name=instance-state-name,Values=running" \
  --query 'Reservations[0].Instances[0].PublicIpAddress' \
  --output text \
  --region $REGION)

# SSH into instance
ssh -i ~/.ssh/${KEY_NAME}.pem ubuntu@${INSTANCE_IP}
Once connected, install Docker:
# Update packages
sudo apt-get update
sudo apt-get upgrade -y

# Install Docker
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh

# Add user to docker group
sudo usermod -aG docker ubuntu
newgrp docker

# Verify installation
docker --version
docker compose version

4. Deploy CVAT

# Clone repository
git clone https://github.com/cvat-ai/cvat
cd cvat

# Get public hostname
CVAT_HOST=$(curl -s http://169.254.169.254/latest/meta-data/public-hostname)
# Or use public IP
# CVAT_HOST=$(curl -s http://169.254.169.254/latest/meta-data/public-ipv4)

# Export hostname
export CVAT_HOST=${CVAT_HOST}

# Start CVAT
docker compose pull
docker compose up -d

# Create superuser
docker exec -it cvat_server bash -ic 'python manage.py createsuperuser'

5. Access CVAT

# Get your access URL
echo "Access CVAT at: http://${CVAT_HOST}:8080"

6. Optional: Configure HTTPS

For production with SSL:
# Install Certbot (outside Docker) or use docker-compose.https.yml
export ACME_EMAIL=admin@example.com
export CVAT_HOST=cvat.yourdomain.com

# Point your domain to instance IP first
# Then deploy with HTTPS
docker compose -f docker-compose.yml -f docker-compose.https.yml up -d

GPU Instance Deployment (P3)

For Auto-Annotation with TensorFlow

P3 instances provide NVIDIA GPUs for running deep learning models.

1. Launch P3 Instance

# P3 instances available:
# p3.2xlarge:  1 GPU (V100), 8 vCPUs, 61GB RAM
# p3.8xlarge:  4 GPUs (V100), 32 vCPUs, 244GB RAM
# p3.16xlarge: 8 GPUs (V100), 64 vCPUs, 488GB RAM

aws ec2 run-instances \
  --image-id ami-xxxxxxxxx \
  --instance-type p3.2xlarge \
  --key-name $KEY_NAME \
  --security-group-ids $SG_ID \
  --block-device-mappings 'DeviceName=/dev/sda1,Ebs={VolumeSize=200,VolumeType=gp3}' \
  --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=cvat-gpu}]' \
  --region $REGION

2. Install NVIDIA Drivers

# SSH into P3 instance
ssh -i ~/.ssh/${KEY_NAME}.pem ubuntu@${INSTANCE_IP}

# Update system
sudo apt-get update
sudo apt-get upgrade -y

# Install NVIDIA drivers
sudo apt-get install -y linux-headers-$(uname -r)
sudo apt-get install -y nvidia-driver-535

# Reboot
sudo reboot
After reboot, reconnect and verify:
# Check NVIDIA driver
nvidia-smi

3. Install NVIDIA Container Toolkit

# Install Docker (if not already installed)
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo usermod -aG docker ubuntu
newgrp docker

# Install NVIDIA Container Toolkit
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
  sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

# Test GPU in Docker
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi

4. Deploy CVAT with Serverless Functions

git clone https://github.com/cvat-ai/cvat
cd cvat

export CVAT_HOST=$(curl -s http://169.254.169.254/latest/meta-data/public-hostname)

# Enable Nuclio for serverless functions
# Edit docker-compose.yml or use override
docker compose up -d

# Deploy serverless functions
cd serverless
# Follow serverless deployment guide

EKS Deployment

Prerequisites

  • eksctl installed
  • kubectl installed
  • AWS CLI configured

1. Create EKS Cluster

# Create cluster configuration
cat > cvat-cluster.yaml <<EOF
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: cvat-cluster
  region: us-east-1
  version: "1.28"

managedNodeGroups:
  - name: cvat-nodes
    instanceType: m5.xlarge
    desiredCapacity: 3
    minSize: 2
    maxSize: 5
    volumeSize: 100
    ssh:
      allow: true
      publicKeyName: your-key-pair
    tags:
      nodegroup-role: worker
    iam:
      withAddonPolicies:
        ebs: true
        efs: true
        albIngress: true
EOF

# Create cluster
eksctl create cluster -f cvat-cluster.yaml
This takes 15-20 minutes.

2. Configure kubectl

# Update kubeconfig
aws eks update-kubeconfig --region us-east-1 --name cvat-cluster

# Verify
kubectl get nodes

3. Install Storage Driver

# Install EBS CSI driver
kubectl apply -k "github.com/kubernetes-sigs/aws-ebs-csi-driver/deploy/kubernetes/overlays/stable/?ref=release-1.25"

# Create storage class
cat <<EOF | kubectl apply -f -
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: cvat-storage
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  encrypted: "true"
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
EOF

4. Install CVAT with Helm

# Add Helm repo
helm repo add cvat https://cvat-ai.github.io/cvat/
helm repo update

# Create namespace
kubectl create namespace cvat

# Create values file
cat > cvat-eks-values.yaml <<EOF
cvat:
  backend:
    defaultStorage:
      storageClassName: cvat-storage
      size: 100Gi
  kvrocks:
    defaultStorage:
      storageClassName: cvat-storage
      size: 200Gi

postgresql:
  primary:
    persistence:
      storageClass: cvat-storage
      size: 50Gi

ingress:
  enabled: true
  hostname: cvat.example.com
  className: alb
  annotations:
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
EOF

# Install
helm install cvat cvat/cvat -n cvat -f cvat-eks-values.yaml

5. Configure Load Balancer

# Install AWS Load Balancer Controller
helm repo add eks https://aws.github.io/eks-charts
helm install aws-load-balancer-controller eks/aws-load-balancer-controller \
  -n kube-system \
  --set clusterName=cvat-cluster

# Get load balancer URL
kubectl get ingress -n cvat

AWS Managed Services Integration

Using Amazon RDS for PostgreSQL

1. Create RDS Instance:
aws rds create-db-instance \
  --db-instance-identifier cvat-db \
  --db-instance-class db.t3.medium \
  --engine postgres \
  --engine-version 15.4 \
  --master-username cvat \
  --master-user-password YourSecurePassword \
  --allocated-storage 100 \
  --storage-type gp3 \
  --vpc-security-group-ids $SG_ID \
  --backup-retention-period 7 \
  --region $REGION
2. Configure CVAT:
# Docker Compose
services:
  cvat_server:
    environment:
      CVAT_POSTGRES_HOST: cvat-db.xxxx.us-east-1.rds.amazonaws.com
      CVAT_POSTGRES_PORT: 5432
      CVAT_POSTGRES_USER: cvat
      CVAT_POSTGRES_PASSWORD: YourSecurePassword
      CVAT_POSTGRES_DBNAME: cvat
# Kubernetes Helm values
postgresql:
  enabled: false
  external:
    host: cvat-db.xxxx.us-east-1.rds.amazonaws.com
    port: 5432
  auth:
    username: cvat
    database: cvat
    password: YourSecurePassword

Using Amazon ElastiCache for Redis

1. Create ElastiCache Cluster:
aws elasticache create-cache-cluster \
  --cache-cluster-id cvat-redis \
  --engine redis \
  --engine-version 7.0 \
  --cache-node-type cache.t3.medium \
  --num-cache-nodes 1 \
  --security-group-ids $SG_ID \
  --region $REGION
2. Configure CVAT:
redis:
  enabled: false
  external:
    host: cvat-redis.xxxx.cache.amazonaws.com
  auth:
    password: ""  # Configure if AUTH enabled

Using Amazon EFS for Shared Storage

1. Create EFS:
EFS_ID=$(aws efs create-file-system \
  --region $REGION \
  --performance-mode generalPurpose \
  --throughput-mode bursting \
  --encrypted \
  --tags Key=Name,Value=cvat-efs \
  --query 'FileSystemId' \
  --output text)

# Create mount targets
for subnet in $SUBNET1 $SUBNET2 $SUBNET3; do
  aws efs create-mount-target \
    --file-system-id $EFS_ID \
    --subnet-id $subnet \
    --security-groups $SG_ID \
    --region $REGION
done
2. Install EFS CSI Driver in EKS:
kubectl apply -k "github.com/kubernetes-sigs/aws-efs-csi-driver/deploy/kubernetes/overlays/stable/?ref=release-1.7"
3. Create StorageClass:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: efs-sc
provisioner: efs.csi.aws.com
parameters:
  provisioningMode: efs-ap
  fileSystemId: fs-xxxxxxxxx
  directoryPerms: "700"
4. Use in CVAT:
cvat:
  backend:
    defaultStorage:
      storageClassName: efs-sc
      accessModes:
        - ReadWriteMany

Using Amazon S3 for Storage

Configure CVAT to use S3 for dataset storage:
cvat:
  backend:
    additionalEnv:
    - name: AWS_S3_BUCKET_NAME
      value: cvat-datasets
    - name: AWS_S3_REGION
      value: us-east-1
    - name: AWS_ACCESS_KEY_ID
      valueFrom:
        secretKeyRef:
          name: aws-credentials
          key: access-key-id
    - name: AWS_SECRET_ACCESS_KEY
      valueFrom:
        secretKeyRef:
          name: aws-credentials
          key: secret-access-key

Cost Optimization

1. Use Spot Instances

For EKS worker nodes:
managedNodeGroups:
  - name: cvat-spot
    instanceTypes:
      - m5.xlarge
      - m5a.xlarge
      - m5n.xlarge
    spot: true
    desiredCapacity: 3

2. Auto-Scaling

# Enable cluster autoscaler
kubectl apply -f https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml

3. Use Reserved Instances

For stable workloads, purchase Reserved Instances for cost savings.

4. S3 Lifecycle Policies

aws s3api put-bucket-lifecycle-configuration \
  --bucket cvat-datasets \
  --lifecycle-configuration file://lifecycle.json
{
  "Rules": [
    {
      "Id": "MoveToIA",
      "Status": "Enabled",
      "Transitions": [
        {
          "Days": 90,
          "StorageClass": "STANDARD_IA"
        }
      ]
    }
  ]
}

Handling Instance Restarts

Problem

AWS EC2 instances change public IP/hostname when stopped and restarted.

Solutions

1. Use Elastic IP:
# Allocate Elastic IP
ALLOCATION_ID=$(aws ec2 allocate-address --region $REGION --query 'AllocationId' --output text)

# Associate with instance
aws ec2 associate-address \
  --instance-id $INSTANCE_ID \
  --allocation-id $ALLOCATION_ID \
  --region $REGION

# Use in CVAT
export CVAT_HOST=$(aws ec2 describe-addresses --allocation-ids $ALLOCATION_ID --query 'Addresses[0].PublicIp' --output text)
2. Use Route 53 DNS:
# Create hosted zone
HOSTED_ZONE_ID=$(aws route53 create-hosted-zone \
  --name example.com \
  --caller-reference $(date +%s) \
  --query 'HostedZone.Id' \
  --output text)

# Create A record
cat > change-batch.json <<EOF
{
  "Changes": [{
    "Action": "UPSERT",
    "ResourceRecordSet": {
      "Name": "cvat.example.com",
      "Type": "A",
      "TTL": 300,
      "ResourceRecords": [{"Value": "${INSTANCE_IP}"}]
    }
  }]
}
EOF

aws route53 change-resource-record-sets \
  --hosted-zone-id $HOSTED_ZONE_ID \
  --change-batch file://change-batch.json

# Use in CVAT
export CVAT_HOST=cvat.example.com
3. Avoid Spot Instances: Don’t use Spot instances for stateful CVAT deployments. Use On-Demand or Reserved Instances.

Monitoring and Logging

CloudWatch Integration

1. Install CloudWatch Agent:
# On EC2
wget https://s3.amazonaws.com/amazoncloudwatch-agent/ubuntu/amd64/latest/amazon-cloudwatch-agent.deb
sudo dpkg -i amazon-cloudwatch-agent.deb

# Configure
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard
2. For EKS:
# Install Fluent Bit
kubectl apply -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/quickstart/cwagent-fluent-bit-quickstart.yaml

Application Monitoring

CVAT includes Grafana for analytics:
# Access Grafana
echo "http://${CVAT_HOST}:8080/analytics"

Backup and Disaster Recovery

Automated Backups

RDS: Automatic backups enabled (7-35 days retention) EBS Snapshots:
# Create snapshot
aws ec2 create-snapshot \
  --volume-id $VOLUME_ID \
  --description "CVAT data backup $(date +%Y%m%d)" \
  --region $REGION

# Automate with AWS Backup
aws backup create-backup-plan --cli-input-json file://backup-plan.json
S3 Replication: Enable cross-region replication for S3 buckets.

Troubleshooting

Instance Metadata Issues

# If instance metadata is not accessible
CVAT_HOST=$(curl -s http://169.254.169.254/latest/meta-data/public-ipv4)
# Or manually set
export CVAT_HOST=YOUR_PUBLIC_IP

Security Group Misconfig

# Verify security group
aws ec2 describe-security-groups --group-ids $SG_ID --region $REGION

Storage Full

# Resize EBS volume
aws ec2 modify-volume --volume-id $VOLUME_ID --size 200 --region $REGION

# Extend filesystem
sudo growpart /dev/nvme0n1 1
sudo resize2fs /dev/nvme0n1p1

Security Best Practices

  1. Use IAM roles instead of access keys
  2. Enable VPC for database and cache isolation
  3. Use Secrets Manager for credentials
  4. Enable AWS WAF for ingress protection
  5. Regular security patches with Systems Manager
  6. Enable CloudTrail for audit logging
  7. Use private subnets for EKS worker nodes
  8. Encrypt EBS volumes and S3 buckets

Next Steps

Build docs developers (and LLMs) love