Prerequisites
Kubernetes Cluster
- Kubernetes 1.23.0 or higher
- kubectl configured and connected to your cluster
- Cluster with at least:
- 3 nodes (for high availability)
- 8 CPU cores total
- 16GB RAM total
- 200GB storage
Required Tools
Storage Provider
Kubernetes cluster must have a default StorageClass or configure one:- ReadWriteMany (RWX): For shared backend storage
- ReadWriteOnce (RWO): For databases (PostgreSQL, ClickHouse, Kvrocks)
Ingress Controller (Optional)
For external access, install an ingress controller:Installation
1. Add Helm Repository
Add the CVAT Helm chart repository:2. Create Namespace
3. Basic Installation
Install CVAT with default configuration:- CVAT backend deployment (server + workers)
- CVAT frontend deployment
- PostgreSQL StatefulSet
- Redis StatefulSet
- Kvrocks StatefulSet
- ClickHouse StatefulSet
- Open Policy Agent deployment
- Vector for log collection
- Grafana for analytics
- Required services and PVCs
4. Wait for Pods to Start
5. Create Superuser
After all pods are running:6. Access CVAT
Port Forward (Testing):Configuration
Custom Values File
Createcvat-values.yaml to customize your deployment:
Ingress Configuration
Using Nginx Ingress
Using Embedded Traefik
External Database
Use an external PostgreSQL database:External Redis
Scaling Workers
Adjust worker replicas based on load:High Availability
For production HA setup:Chart Structure
The CVAT Helm chart (v2.58.1) includes:Dependencies
Automatically installed:- postgresql (v12.1.x): Primary database
- redis (v19.6.4): Caching layer
- clickhouse (v4.1.x): Analytics database
- vector (v0.19.x): Log aggregation
- grafana (v6.60.x): Analytics UI
- traefik (v37.3.x): Optional ingress
- nuclio (v0.21.x): Optional serverless functions
Templates
Key Kubernetes resources created:- Deployments: cvat-backend-server, cvat-frontend, cvat-opa
- StatefulSets: PostgreSQL, Redis, Kvrocks, ClickHouse
- Deployments (Workers): Export, Import, Annotation, Webhooks, Quality Reports, Chunks, Consensus, Utils
- Services: Frontend, Backend, OPA, Databases
- PersistentVolumeClaims: Backend storage, Kvrocks cache, database storage
- ConfigMaps: Application config, Vector config, Grafana dashboards
- Secrets: Database credentials, Redis passwords, ClickHouse auth
- Jobs: Backend initializer (runs migrations)
- Ingress: Optional external access
Operations
Upgrade CVAT
Rollback
Uninstall
Backup and Restore
Backup PostgreSQL:View Logs
Exec into Pods
Monitoring
Pod Status:Troubleshooting
Pods Not Starting
Check pod status:- ImagePullBackOff: Check image name and registry access
- CrashLoopBackOff: Check logs for application errors
- Pending: Check storage class and resource availability
Database Connection Issues
Storage Issues
Worker Not Processing Jobs
Ingress Not Working
Advanced Configuration
Custom Storage Classes
Node Affinity and Tolerations
Additional Environment Variables
Custom Volumes
Production Best Practices
- Use specific image tags: Don’t use
devorlatestin production - Enable resource limits: Prevent resource exhaustion
- Configure HPA: Auto-scale based on CPU/memory
- Use external databases: For better reliability and backups
- Enable monitoring: Use Prometheus/Grafana for metrics
- Regular backups: Automate database and volume backups
- TLS everywhere: Use cert-manager for automatic certificates
- Network policies: Restrict pod-to-pod communication
- Secrets management: Use external secret managers (Vault, AWS Secrets Manager)
- Multi-zone deployment: Spread pods across availability zones