Skip to main content

Kubernetes Deployment with Helm

Deploy ZenML server on Kubernetes for production-grade, scalable MLOps infrastructure. ZenML provides official Helm charts that simplify deployment and management on Kubernetes clusters.

Prerequisites

Before deploying ZenML on Kubernetes, ensure you have:

Kubernetes Cluster

Running cluster with kubectl access (v1.19+)

Helm

Helm 3.x installed and configured

Ingress Controller

Nginx, Traefik, or similar (optional but recommended)

Storage Class

Default StorageClass for persistent volumes

Quick Start

Install ZenML Helm Chart

Deploy ZenML server with default configuration:
# Add ZenML Helm repository
helm repo add zenml https://zenml-io.github.io/zenml
helm repo update

# Install ZenML server
helm install zenml-server zenml/zenml \
  --namespace zenml \
  --create-namespace
This deploys:
  • ZenML server on port 80
  • SQLite database (for testing)
  • No ingress (ClusterIP service)
  • No authentication

Verify Installation

Check deployment status:
# Check pods
kubectl get pods -n zenml

# Check service
kubectl get svc -n zenml

# View logs
kubectl logs -n zenml -l app.kubernetes.io/name=zenml -f

Access the Server

Port-forward to access locally:
kubectl port-forward -n zenml svc/zenml-server 8080:80
Access at http://localhost:8080

Production Deployment

For production environments, create a custom values.yaml file:
# values.yaml
zenml:
  # Server configuration
  replicaCount: 3
  
  image:
    repository: zenmldocker/zenml-server
    tag: "0.94.0"
    pullPolicy: IfNotPresent
  
  # Server URL (required for production)
  serverURL: https://zenml.example.com
  
  # Authentication
  auth:
    authType: OAUTH2_PASSWORD_BEARER
    jwtSecretKey: "<generate-with-openssl-rand-hex-32>"
    jwtTokenExpireMinutes: 60
    corsAllowOrigins:
      - "https://zenml.example.com"
  
  # External MySQL database
  database:
    url: "mysql://zenml:password@mysql-host:3306/zenml"
    # Store password in Kubernetes secret
    passwordSecretRef:
      name: zenml-db-secret
      key: password
    
    # Connection pool settings
    poolSize: 20
    maxOverflow: 20
    
    # SSL configuration
    ssl: true
    sslVerifyServerCert: true
    
    # Backup strategy
    backupStrategy: database
    backupDatabase: zenml_backup
  
  # Secrets store (use cloud provider)
  secretsStore:
    enabled: true
    type: aws  # or gcp, azure, hashicorp
    aws:
      authMethod: iam-role
      authConfig:
        region: us-east-1
        role_arn: arn:aws:iam::ACCOUNT:role/zenml-secrets-role
  
  # Performance tuning
  threadPoolSize: 40
  authThreadPoolSize: 5
  requestTimeout: 20
  requestCacheTimeout: 300
  
  # Ingress configuration
  ingress:
    enabled: true
    className: nginx
    annotations:
      cert-manager.io/cluster-issuer: letsencrypt-prod
      nginx.ingress.kubernetes.io/ssl-redirect: "true"
    host: zenml.example.com
    path: /
    tls:
      enabled: true
      secretName: zenml-tls-cert
  
  # Resource limits
  resources:
    requests:
      cpu: 2000m
      memory: 4Gi
    limits:
      cpu: 4000m
      memory: 8Gi

# Autoscaling
autoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 10
  targetCPUUtilizationPercentage: 80
  targetMemoryUtilizationPercentage: 80

# Service account
serviceAccount:
  create: true
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::ACCOUNT:role/zenml-server-role

# Security context
podSecurityContext:
  runAsNonRoot: true
  runAsUser: 1000
  fsGroup: 1000

securityContext:
  allowPrivilegeEscalation: false
  capabilities:
    drop:
      - ALL
  readOnlyRootFilesystem: false

Deploy with Custom Values

# Create database secret
kubectl create secret generic zenml-db-secret \
  --from-literal=password='your-secure-password' \
  -n zenml

# Install with custom values
helm install zenml-server zenml/zenml \
  --namespace zenml \
  --create-namespace \
  --values values.yaml

Database Configuration

Using External MySQL

Recommended for production. Use managed database services:

AWS RDS

Managed MySQL on AWS

Google Cloud SQL

Managed MySQL on GCP

Azure Database

Managed MySQL on Azure

AWS RDS Example

zenml:
  database:
    url: "mysql://zenml:[email protected]:3306/zenml"
    ssl: true
    sslCa:
      value: |
        -----BEGIN CERTIFICATE-----
        <AWS RDS CA certificate>
        -----END CERTIFICATE-----

Google Cloud SQL Example

zenml:
  database:
    url: "mysql://zenml:[email protected]:3306/zenml"

# Use Cloud SQL proxy sidecar
podAnnotations:
  cloud.google.com/sql-proxy-connection-name: "project:region:instance"

Database Persistence (SQLite)

For development/testing only:
zenml:
  database:
    persistence:
      enabled: true
      size: 10Gi
      storageClassName: standard

Secrets Management

AWS Secrets Manager

zenml:
  secretsStore:
    enabled: true
    type: aws
    aws:
      authMethod: iam-role  # or secret-key
      authConfig:
        region: us-east-1
        role_arn: arn:aws:iam::123456789:role/zenml-secrets

serviceAccount:
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789:role/zenml-secrets

GCP Secret Manager

zenml:
  secretsStore:
    enabled: true
    type: gcp
    gcp:
      authMethod: service-account
      authConfig:
        project_id: my-gcp-project
        service_account_json: |
          {
            "type": "service_account",
            "project_id": "my-gcp-project",
            ...
          }

Azure Key Vault

zenml:
  secretsStore:
    enabled: true
    type: azure
    azure:
      authMethod: service-principal
      authConfig:
        client_id: "<client-id>"
        client_secret: "<client-secret>"
        tenant_id: "<tenant-id>"
      key_vault_name: zenml-keyvault

HashiCorp Vault

zenml:
  secretsStore:
    enabled: true
    type: hashicorp
    hashicorp:
      authMethod: token  # or app_role, aws
      authConfig:
        vault_addr: https://vault.example.com:8200
        vault_token: "<vault-token>"
        mount_point: secret

Ingress Configuration

Nginx Ingress

zenml:
  ingress:
    enabled: true
    className: nginx
    annotations:
      cert-manager.io/cluster-issuer: letsencrypt-prod
      nginx.ingress.kubernetes.io/ssl-redirect: "true"
      nginx.ingress.kubernetes.io/proxy-body-size: "100m"
      nginx.ingress.kubernetes.io/proxy-read-timeout: "300"
    host: zenml.example.com
    path: /
    tls:
      enabled: true
      secretName: zenml-tls-cert

Traefik Ingress

zenml:
  ingress:
    enabled: true
    className: traefik
    annotations:
      cert-manager.io/cluster-issuer: letsencrypt-prod
      traefik.ingress.kubernetes.io/router.tls: "true"
    host: zenml.example.com
    path: /
    tls:
      enabled: true
      secretName: zenml-tls-cert

Custom Path (Behind Proxy)

zenml:
  rootUrlPath: /zenml
  ingress:
    enabled: true
    className: nginx
    annotations:
      nginx.ingress.kubernetes.io/rewrite-target: /$1
    host: example.com
    path: /zenml/?(.*)

SSL/TLS Configuration

Using cert-manager

Install cert-manager:
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.0/cert-manager.yaml
Create ClusterIssuer:
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: [email protected]
    privateKeySecretRef:
      name: letsencrypt-prod
    solvers:
      - http01:
          ingress:
            class: nginx
Apply issuer:
kubectl apply -f cluster-issuer.yaml

Self-Signed Certificates

zenml:
  ingress:
    enabled: true
    tls:
      enabled: true
      generateCerts: true  # Generate self-signed certs
      secretName: zenml-tls-certs

Custom CA Certificates

Add custom CA certificates for internal services:
zenml:
  certificates:
    customCAs:
      - name: "corporate-ca"
        certificate: |
          -----BEGIN CERTIFICATE-----
          MIIDXTCCAkWgAwIBAgIJAJC1HiIAZAiIMA0GCSqGSIb3DQEBCwUAMEUxCzAJBgNV
          ...
          -----END CERTIFICATE-----
    
    # Or reference existing secrets
    secretRefs:
      - name: "ca-bundle-secret"
        key: "ca.crt"

High Availability Setup

Multiple Replicas

zenml:
  replicaCount: 3
  
affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchExpressions:
              - key: app.kubernetes.io/name
                operator: In
                values:
                  - zenml
          topologyKey: kubernetes.io/hostname

Pod Disruption Budget

Create PodDisruptionBudget:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: zenml-server-pdb
  namespace: zenml
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: zenml

Horizontal Pod Autoscaling

autoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70
  targetMemoryUtilizationPercentage: 80
  
  # Custom metrics (optional)
  customMetrics:
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: "1000"

Monitoring and Logging

Prometheus Metrics

Enable Prometheus monitoring:
serviceMonitor:
  enabled: true
  interval: 30s
  scrapeTimeout: 10s
  labels:
    release: prometheus

Logging Configuration

zenml:
  debug: false
  environment:
    ZENML_LOGGING_VERBOSITY: INFO
    ZENML_ANALYTICS_OPT_IN: "true"

Health Checks

livenessProbe:
  httpGet:
    path: /health
    port: http
  initialDelaySeconds: 15
  periodSeconds: 15
  timeoutSeconds: 10
  failureThreshold: 5

readinessProbe:
  httpGet:
    path: /ready
    port: http
  initialDelaySeconds: 8
  periodSeconds: 15
  timeoutSeconds: 10
  failureThreshold: 5

Upgrade and Rollback

Upgrade ZenML

# Update Helm repository
helm repo update

# Upgrade to latest version
helm upgrade zenml-server zenml/zenml \
  --namespace zenml \
  --values values.yaml

# Upgrade to specific version
helm upgrade zenml-server zenml/zenml \
  --namespace zenml \
  --version 0.94.0 \
  --values values.yaml

Rollback Deployment

# View deployment history
helm history zenml-server -n zenml

# Rollback to previous version
helm rollback zenml-server -n zenml

# Rollback to specific revision
helm rollback zenml-server 3 -n zenml

Backup and Recovery

Database Backup

Configure automatic backups:
zenml:
  database:
    # Backup before migrations
    backupStrategy: database  # or dump-file, mydumper
    backupDatabase: zenml_backup

Manual Backup

# Backup MySQL database
kubectl exec -n zenml mysql-pod -- \
  mysqldump -u root -p zenml > zenml-backup-$(date +%Y%m%d).sql

# Backup persistent volumes
kubectl get pv -n zenml
kubectl get pvc -n zenml

Restore from Backup

# Restore MySQL database
kubectl exec -i -n zenml mysql-pod -- \
  mysql -u root -p zenml < zenml-backup-20240309.sql

Troubleshooting

Pod Not Starting

Check pod status and events:
kubectl describe pod -n zenml -l app.kubernetes.io/name=zenml
kubectl get events -n zenml --sort-by='.lastTimestamp'

Database Connection Issues

Test database connectivity:
# Port-forward to MySQL
kubectl port-forward -n zenml svc/mysql 3306:3306

# Test connection
mysql -h 127.0.0.1 -u zenml -p

Ingress Not Working

Check ingress configuration:
kubectl describe ingress -n zenml
kubectl logs -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx

View Server Logs

# View logs from all replicas
kubectl logs -n zenml -l app.kubernetes.io/name=zenml --all-containers=true

# Follow logs
kubectl logs -n zenml -l app.kubernetes.io/name=zenml -f

# View logs from specific pod
kubectl logs -n zenml zenml-server-0

Performance Tuning

Resource Optimization

resources:
  requests:
    cpu: 2000m
    memory: 4Gi
  limits:
    cpu: 4000m
    memory: 8Gi

# Set node affinity for performance
nodeSelector:
  node.kubernetes.io/instance-type: c5.xlarge

Database Connection Pooling

zenml:
  database:
    poolSize: 20
    maxOverflow: 20
  
  # Coordinate with thread pools
  threadPoolSize: 40
  authThreadPoolSize: 5

Request Handling

zenml:
  requestTimeout: 20
  requestDeduplication: true
  requestCacheTimeout: 300

Security Best Practices

Use RBAC

Enable Kubernetes RBAC for service account permissions

Network Policies

Restrict pod-to-pod communication with NetworkPolicies

Secret Encryption

Enable encryption at rest for Kubernetes Secrets

Pod Security

Use Pod Security Standards (restricted profile)

Image Scanning

Scan container images for vulnerabilities

TLS Everywhere

Use TLS for all network communications

Next Steps

Docker Deployment

Alternative Docker-based deployment

Configuration Guide

Advanced server configuration

Security Setup

Secure your deployment

Reference

Build docs developers (and LLMs) love