Kubernetes Deployment

Overview

Kubernetes provides an excellent platform for running Cadence in production, offering automated deployment, scaling, and management of containerized applications. This guide covers best practices for deploying Cadence on Kubernetes.

Architecture Overview

A production Cadence deployment on Kubernetes typically includes:

StatefulSets: For Cadence services requiring stable network identities
Deployments: For stateless Cadence services
Services: For service discovery and load balancing
ConfigMaps: For configuration management
Secrets: For sensitive credentials
PersistentVolumes: For database persistence (if running databases in-cluster)

Prerequisites

Kubernetes cluster (1.20+)
kubectl configured to access your cluster
Database (Cassandra, MySQL, or PostgreSQL) - managed or self-hosted
Optional: Helm 3.x for package management
Optional: ElasticSearch/OpenSearch for advanced visibility

Deployment Strategy

Service Separation

Deploy each Cadence service type separately for independent scaling:

Frontend: User-facing API endpoints
History: Workflow execution engines
Matching: Task list management
Worker: System workflows and replication

Namespace Design

Organize resources using Kubernetes namespaces:

kubectl create namespace cadence
kubectl create namespace cadence-system  # For monitoring, operators

Configuration with ConfigMaps

Store Cadence configuration in ConfigMaps:

configmap.yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: cadence-config
  namespace: cadence
data:
  config.yaml: |
    log:
      stdout: true
      level: info
    
    persistence:
      defaultStore: cass-default
      visibilityStore: cass-visibility
      numHistoryShards: 4096
      datastores:
        cass-default:
          nosql:
            pluginName: "cassandra"
            hosts: "cassandra.cadence.svc.cluster.local"
            keyspace: "cadence"
            consistency: LOCAL_QUORUM
        cass-visibility:
          nosql:
            pluginName: "cassandra"
            hosts: "cassandra.cadence.svc.cluster.local"
            keyspace: "cadence_visibility"
    
    ringpop:
      name: cadence
      bootstrapMode: dns
      bootstrapHosts:
        - "cadence-frontend-headless.cadence.svc.cluster.local:7933"
        - "cadence-history-headless.cadence.svc.cluster.local:7934"
        - "cadence-matching-headless.cadence.svc.cluster.local:7935"
      maxJoinDuration: 30s
    
    services:
      frontend:
        rpc:
          port: 7933
          grpcPort: 7833
          bindOnLocalHost: false
        metrics:
          prometheus:
            timerType: "histogram"
            listenAddress: "0.0.0.0:8000"
      
      history:
        rpc:
          port: 7934
          grpcPort: 7834
          bindOnLocalHost: false
        metrics:
          prometheus:
            timerType: "histogram"
            listenAddress: "0.0.0.0:8001"
      
      matching:
        rpc:
          port: 7935
          grpcPort: 7835
          bindOnLocalHost: false
        metrics:
          prometheus:
            timerType: "histogram"
            listenAddress: "0.0.0.0:8002"
      
      worker:
        rpc:
          port: 7939
          bindOnLocalHost: false
        metrics:
          prometheus:
            timerType: "histogram"
            listenAddress: "0.0.0.0:8003"

Apply the ConfigMap:

kubectl apply -f configmap.yaml

Secrets Management

Store sensitive credentials in Kubernetes Secrets:

secrets.yaml

apiVersion: v1
kind: Secret
metadata:
  name: cadence-secrets
  namespace: cadence
type: Opaque
stringData:
  cassandra-password: "your-secure-password"
  mysql-password: "your-secure-password"
  postgres-password: "your-secure-password"

Apply the Secret:

kubectl apply -f secrets.yaml

Never commit secrets to version control. Use tools like sealed-secrets, external-secrets, or cloud-native secret managers (AWS Secrets Manager, GCP Secret Manager, Azure Key Vault).

Frontend Deployment

Deploy the Frontend service with Deployment and Service:

frontend-deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cadence-frontend
  namespace: cadence
  labels:
    app: cadence-frontend
spec:
  replicas: 3
  selector:
    matchLabels:
      app: cadence-frontend
  template:
    metadata:
      labels:
        app: cadence-frontend
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8000"
        prometheus.io/path: "/metrics"
    spec:
      containers:
      - name: cadence-frontend
        image: ubercadence/server:1.2.7
        ports:
        - containerPort: 7933
          name: tchannel
          protocol: TCP
        - containerPort: 7833
          name: grpc
          protocol: TCP
        - containerPort: 8000
          name: metrics
          protocol: TCP
        env:
        - name: SERVICES
          value: "frontend"
        - name: LOG_LEVEL
          value: "info"
        - name: CASSANDRA_SEEDS
          value: "cassandra.cadence.svc.cluster.local"
        - name: CASSANDRA_PASSWORD
          valueFrom:
            secretKeyRef:
              name: cadence-secrets
              key: cassandra-password
        - name: NUM_HISTORY_SHARDS
          value: "4096"
        - name: PROMETHEUS_ENDPOINT_0
          value: "0.0.0.0:8000"
        volumeMounts:
        - name: config
          mountPath: /etc/cadence/config
          readOnly: true
        livenessProbe:
          httpGet:
            path: /metrics
            port: 8000
          initialDelaySeconds: 60
          periodSeconds: 30
          timeoutSeconds: 5
        readinessProbe:
          httpGet:
            path: /metrics
            port: 8000
          initialDelaySeconds: 30
          periodSeconds: 10
        resources:
          requests:
            cpu: 500m
            memory: 1Gi
          limits:
            cpu: 2000m
            memory: 4Gi
      volumes:
      - name: config
        configMap:
          name: cadence-config
---
apiVersion: v1
kind: Service
metadata:
  name: cadence-frontend
  namespace: cadence
  labels:
    app: cadence-frontend
spec:
  type: ClusterIP
  ports:
  - port: 7933
    targetPort: 7933
    protocol: TCP
    name: tchannel
  - port: 7833
    targetPort: 7833
    protocol: TCP
    name: grpc
  selector:
    app: cadence-frontend
---
apiVersion: v1
kind: Service
metadata:
  name: cadence-frontend-headless
  namespace: cadence
  labels:
    app: cadence-frontend
spec:
  clusterIP: None
  ports:
  - port: 7933
    name: tchannel
  selector:
    app: cadence-frontend

History Deployment

Deploy History service as StatefulSet for stable network identities:

history-statefulset.yaml

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: cadence-history
  namespace: cadence
spec:
  serviceName: cadence-history-headless
  replicas: 6
  selector:
    matchLabels:
      app: cadence-history
  template:
    metadata:
      labels:
        app: cadence-history
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8001"
    spec:
      containers:
      - name: cadence-history
        image: ubercadence/server:1.2.7
        ports:
        - containerPort: 7934
          name: tchannel
        - containerPort: 7834
          name: grpc
        - containerPort: 8001
          name: metrics
        env:
        - name: SERVICES
          value: "history"
        - name: LOG_LEVEL
          value: "info"
        - name: CASSANDRA_SEEDS
          value: "cassandra.cadence.svc.cluster.local"
        - name: NUM_HISTORY_SHARDS
          value: "4096"
        - name: PROMETHEUS_ENDPOINT_2
          value: "0.0.0.0:8001"
        volumeMounts:
        - name: config
          mountPath: /etc/cadence/config
        resources:
          requests:
            cpu: 1000m
            memory: 2Gi
          limits:
            cpu: 4000m
            memory: 8Gi
      volumes:
      - name: config
        configMap:
          name: cadence-config
---
apiVersion: v1
kind: Service
metadata:
  name: cadence-history-headless
  namespace: cadence
spec:
  clusterIP: None
  ports:
  - port: 7934
    name: tchannel
  - port: 7834
    name: grpc
  selector:
    app: cadence-history

Matching Deployment

matching-deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cadence-matching
  namespace: cadence
spec:
  replicas: 3
  selector:
    matchLabels:
      app: cadence-matching
  template:
    metadata:
      labels:
        app: cadence-matching
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8002"
    spec:
      containers:
      - name: cadence-matching
        image: ubercadence/server:1.2.7
        ports:
        - containerPort: 7935
          name: tchannel
        - containerPort: 7835
          name: grpc
        - containerPort: 8002
          name: metrics
        env:
        - name: SERVICES
          value: "matching"
        - name: LOG_LEVEL
          value: "info"
        - name: PROMETHEUS_ENDPOINT_1
          value: "0.0.0.0:8002"
        volumeMounts:
        - name: config
          mountPath: /etc/cadence/config
        resources:
          requests:
            cpu: 500m
            memory: 1Gi
          limits:
            cpu: 2000m
            memory: 4Gi
      volumes:
      - name: config
        configMap:
          name: cadence-config
---
apiVersion: v1
kind: Service
metadata:
  name: cadence-matching-headless
  namespace: cadence
spec:
  clusterIP: None
  ports:
  - port: 7935
    name: tchannel
  selector:
    app: cadence-matching

Worker Deployment

worker-deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cadence-worker
  namespace: cadence
spec:
  replicas: 2
  selector:
    matchLabels:
      app: cadence-worker
  template:
    metadata:
      labels:
        app: cadence-worker
    spec:
      containers:
      - name: cadence-worker
        image: ubercadence/server:1.2.7
        ports:
        - containerPort: 7939
          name: tchannel
        - containerPort: 8003
          name: metrics
        env:
        - name: SERVICES
          value: "worker"
        - name: LOG_LEVEL
          value: "info"
        - name: PROMETHEUS_ENDPOINT_3
          value: "0.0.0.0:8003"
        volumeMounts:
        - name: config
          mountPath: /etc/cadence/config
        resources:
          requests:
            cpu: 250m
            memory: 512Mi
          limits:
            cpu: 1000m
            memory: 2Gi
      volumes:
      - name: config
        configMap:
          name: cadence-config

Ingress Configuration

Expose Frontend service via Ingress:

ingress.yaml

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: cadence-frontend-ingress
  namespace: cadence
  annotations:
    nginx.ingress.kubernetes.io/backend-protocol: "GRPC"
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - cadence.example.com
    secretName: cadence-tls
  rules:
  - host: cadence.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: cadence-frontend
            port:
              number: 7833

Horizontal Pod Autoscaling

Autoscale services based on CPU/memory:

hpa.yaml

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: cadence-frontend-hpa
  namespace: cadence
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: cadence-frontend
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: cadence-matching-hpa
  namespace: cadence
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: cadence-matching
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

History service uses StatefulSet and requires careful scaling. Coordinate with shard distribution when scaling history pods.

Monitoring with Prometheus

Deploy ServiceMonitor for Prometheus Operator:

servicemonitor.yaml

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: cadence-metrics
  namespace: cadence
  labels:
    app: cadence
spec:
  selector:
    matchLabels:
      app: cadence-frontend
  endpoints:
  - port: metrics
    interval: 30s
    path: /metrics
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: cadence-history-metrics
  namespace: cadence
spec:
  selector:
    matchLabels:
      app: cadence-history
  endpoints:
  - port: metrics
    interval: 30s

Production Best Practices

Use pod anti-affinity

Spread replicas across nodes:

affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
    - weight: 100
      podAffinityTerm:
        labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - cadence-frontend
        topologyKey: kubernetes.io/hostname

Configure resource quotas

apiVersion: v1
kind: ResourceQuota
metadata:
  name: cadence-quota
  namespace: cadence
spec:
  hard:
    requests.cpu: "20"
    requests.memory: 40Gi
    limits.cpu: "40"
    limits.memory: 80Gi

Enable pod disruption budgets

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: cadence-frontend-pdb
  namespace: cadence
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: cadence-frontend

Use network policies

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: cadence-network-policy
  namespace: cadence
spec:
  podSelector:
    matchLabels:
      app: cadence-frontend
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: ingress-nginx
    ports:
    - protocol: TCP
      port: 7833

Deployment Workflow

Deploy database (if self-hosted)

Use operators like:

K8ssandra for Cassandra
MySQL Operator
PostgreSQL Operator (Zalando, Crunchy)

Initialize database schemas

Run schema initialization as a Job:

apiVersion: batch/v1
kind: Job
metadata:
  name: cadence-schema-init
  namespace: cadence
spec:
  template:
    spec:
      containers:
      - name: schema-init
        image: ubercadence/server:1.2.7
        command:
        - /bin/sh
        - -c
        - |
          cadence-cassandra-tool \
            --ep cassandra.cadence.svc.cluster.local \
            create --keyspace cadence
          cadence-cassandra-tool \
            --ep cassandra.cadence.svc.cluster.local \
            --keyspace cadence \
            setup-schema --version 0.0
      restartPolicy: OnFailure

Deploy ConfigMap and Secrets

kubectl apply -f configmap.yaml
kubectl apply -f secrets.yaml

Deploy Cadence services

kubectl apply -f frontend-deployment.yaml
kubectl apply -f history-statefulset.yaml
kubectl apply -f matching-deployment.yaml
kubectl apply -f worker-deployment.yaml

Configure monitoring

kubectl apply -f servicemonitor.yaml
kubectl apply -f hpa.yaml

Verify deployment

kubectl get pods -n cadence
kubectl logs -n cadence -l app=cadence-frontend

Troubleshooting

Pods Not Starting

# Check pod status
kubectl get pods -n cadence

# Describe pod for events
kubectl describe pod <pod-name> -n cadence

# Check logs
kubectl logs <pod-name> -n cadence

# Check previous logs if pod crashed
kubectl logs <pod-name> -n cadence --previous

Service Discovery Issues

# Test DNS resolution
kubectl run -it --rm debug --image=busybox --restart=Never -- \
  nslookup cadence-frontend.cadence.svc.cluster.local

# Check service endpoints
kubectl get endpoints -n cadence

Database Connectivity

# Test from pod
kubectl exec -it <pod-name> -n cadence -- \
  telnet cassandra.cadence.svc.cluster.local 9042

Helm Chart (Community)

While there’s no official Helm chart, community charts are available. Basic structure:

values.yaml

image:
  repository: ubercadence/server
  tag: 1.2.7
  pullPolicy: IfNotPresent

frontend:
  replicaCount: 3
  resources:
    requests:
      cpu: 500m
      memory: 1Gi

history:
  replicaCount: 6
  resources:
    requests:
      cpu: 1000m
      memory: 2Gi

matching:
  replicaCount: 3
  resources:
    requests:
      cpu: 500m
      memory: 1Gi

worker:
  replicaCount: 2

cassandra:
  enabled: false
  external:
    hosts:
      - cassandra.cadence.svc.cluster.local

Get Started

Core Concepts

Architecture

Deployment

Operations

Client SDKs

Kubernetes Deployment

Overview

Architecture Overview

Prerequisites

Deployment Strategy

Service Separation

Namespace Design

Configuration with ConfigMaps

Secrets Management

Frontend Deployment

History Deployment

Matching Deployment

Worker Deployment

Ingress Configuration

Horizontal Pod Autoscaling

Monitoring with Prometheus

Production Best Practices

Deployment Workflow

Troubleshooting

Pods Not Starting

Service Discovery Issues

Database Connectivity

Helm Chart (Community)

Next Steps

Configuration

Server Setup

Build docs developers (and LLMs) love

Get Started

Core Concepts

Architecture

Deployment

Operations

Client SDKs

​Overview

​Architecture Overview

​Prerequisites

​Deployment Strategy

​Service Separation

​Namespace Design

​Configuration with ConfigMaps

​Secrets Management

​Frontend Deployment

​History Deployment

​Matching Deployment

​Worker Deployment

​Ingress Configuration

​Horizontal Pod Autoscaling

​Monitoring with Prometheus

​Production Best Practices

​Deployment Workflow

​Troubleshooting

​Pods Not Starting

​Service Discovery Issues

​Database Connectivity

​Helm Chart (Community)

​Next Steps

Configuration

Server Setup

Build docs developers (and LLMs) love

Overview

Architecture Overview

Prerequisites

Deployment Strategy

Service Separation

Namespace Design

Configuration with ConfigMaps

Secrets Management

Frontend Deployment

History Deployment

Matching Deployment

Worker Deployment

Ingress Configuration

Horizontal Pod Autoscaling

Monitoring with Prometheus

Production Best Practices

Deployment Workflow

Troubleshooting

Pods Not Starting

Service Discovery Issues

Database Connectivity

Helm Chart (Community)

Next Steps