Deployment Options
Docker Compose
Best for: Small to medium teams, development, testing, and quick deployments Pros:- Simple setup with minimal configuration
- Easy to run on a single machine
- Quick deployment (5-10 minutes)
- Ideal for evaluation and proof-of-concept
- Lower resource requirements
- Limited scalability
- No built-in high availability
- Manual scaling of components
- Docker Engine 20.10+
- Docker Compose 2.x
- 4GB RAM minimum (8GB recommended)
- 2 CPU cores minimum (4 recommended)
- 50GB disk space
Kubernetes with Helm
Best for: Production environments, enterprise deployments, auto-scaling needs Pros:- Horizontal scaling and load balancing
- High availability and fault tolerance
- Resource management and optimization
- Rolling updates with zero downtime
- Cloud-native deployment
- More complex setup and maintenance
- Requires Kubernetes expertise
- Higher infrastructure overhead
- Kubernetes 1.23+
- Helm 3.x
- kubectl configured
- Persistent storage provider
- Ingress controller (optional but recommended)
AWS Deployment
Best for: Cloud deployments with GPU support for auto-annotation Pros:- Elastic infrastructure
- GPU instance support (P3 instances)
- Integration with AWS services (S3, RDS, EFS)
- Managed database options
- Cloud costs
- AWS-specific configuration
Architecture Overview
CVAT’s self-hosted architecture consists of several key components:Core Services
CVAT Server (cvat/server)
- Django-based API server
- Handles authentication, authorization, and business logic
- Runs on port 8080
- Manages task creation, annotation, and exports
cvat/ui)
- React-based frontend application
- Serves the web interface on port 8000
- Provides annotation tools and project management UI
- Routes traffic to appropriate services
- Handles SSL/TLS termination
- Load balancing and service discovery
- Default port: 8080 (HTTP), 443 (HTTPS)
Data Layer
PostgreSQL (postgres:15-alpine)- Primary relational database
- Stores users, projects, tasks, and metadata
- Port: 5432
- In-memory cache for session management
- Job queue for background tasks
- Port: 6379
- Redis-compatible on-disk storage
- Caches media chunks and temporary data
- Port: 6666
- Analytics and event logging
- Stores user actions and system events
- Port: 8123
Worker Services
CVAT uses multiple worker processes to handle background tasks asynchronously:- cvat_worker_import: Handles dataset and annotation imports (2 processes)
- cvat_worker_export: Processes annotation exports (2 processes)
- cvat_worker_annotation: Manages annotation operations (1 process)
- cvat_worker_webhooks: Sends webhook notifications (1 process)
- cvat_worker_quality_reports: Generates quality analytics (1 process)
- cvat_worker_chunks: Processes video/image chunks (2 processes)
- cvat_worker_consensus: Handles consensus annotation calculations (1 process)
- cvat_worker_utils: Notifications and cleaning tasks (1 process)
Analytics Stack
Vector (timberio/vector:0.26.0-alpine)- Log aggregation and forwarding
- Sends events to ClickHouse
- Port: 8282
- Analytics dashboard
- Visualizes user activity and system metrics
- Accessible at
/analytics
Security & Policy
Open Policy Agent (openpolicyagent/opa:1.12.2)- Policy-based access control
- Dynamic permission evaluation
- Port: 8181
Data Flow
- User Request: Browser → Traefik → CVAT UI/Server
- Authentication: Server → PostgreSQL (user validation)
- Session Management: Server → Redis (session cache)
- Data Processing: Server → Workers (background jobs)
- Media Storage: Server → Shared volumes → Kvrocks cache
- Analytics: Server → Vector → ClickHouse → Grafana
- Authorization: Server → OPA → Decision
Storage Requirements
Docker Volumes
- cvat_db: PostgreSQL data (~1-5GB depending on usage)
- cvat_data: Media files and datasets (varies greatly, plan for 100GB+)
- cvat_keys: Authentication keys and secrets (~1MB)
- cvat_logs: Application logs (~1-10GB)
- cvat_inmem_db: Redis persistence (~100MB-1GB)
- cvat_cache_db: Kvrocks cache (10-50GB)
- cvat_events_db: ClickHouse analytics (~5-20GB)
Kubernetes Storage
- Backend PVC: 20Gi (default, configurable)
- Kvrocks PVC: 100Gi (default, configurable)
- PostgreSQL PVC: Based on Bitnami chart settings
Network Ports
| Service | Internal Port | External Port | Protocol |
|---|---|---|---|
| Traefik | 8080 | 8080 | HTTP |
| Traefik (HTTPS) | 8443 | 443 | HTTPS |
| CVAT Server | 8080 | - | HTTP |
| CVAT UI | 8000 | - | HTTP |
| PostgreSQL | 5432 | - | TCP |
| Redis | 6379 | - | TCP |
| Kvrocks | 6666 | - | TCP |
| ClickHouse | 8123 | - | HTTP |
| OPA | 8181 | - | HTTP |
| Vector | 8282 | - | HTTP |
| Grafana | 3000 | - | HTTP |
Environment Variables
Key environment variables for configuration:CVAT_HOST: Hostname for accessing CVAT (default:localhost)CVAT_VERSION: Docker image version (default:dev)CVAT_POSTGRES_HOST: PostgreSQL hostnameCVAT_REDIS_INMEM_HOST: Redis hostnameCVAT_REDIS_ONDISK_HOST: Kvrocks hostnameALLOWED_HOSTS: Django allowed hostsCVAT_ANALYTICS: Enable analytics (0 or 1)CVAT_BASE_URL: Base URL for the application