Skip to main content

Overview

Kubernetes provides an excellent platform for running Cadence in production, offering automated deployment, scaling, and management of containerized applications. This guide covers best practices for deploying Cadence on Kubernetes.

Architecture Overview

A production Cadence deployment on Kubernetes typically includes:
  • StatefulSets: For Cadence services requiring stable network identities
  • Deployments: For stateless Cadence services
  • Services: For service discovery and load balancing
  • ConfigMaps: For configuration management
  • Secrets: For sensitive credentials
  • PersistentVolumes: For database persistence (if running databases in-cluster)

Prerequisites

  • Kubernetes cluster (1.20+)
  • kubectl configured to access your cluster
  • Database (Cassandra, MySQL, or PostgreSQL) - managed or self-hosted
  • Optional: Helm 3.x for package management
  • Optional: ElasticSearch/OpenSearch for advanced visibility

Deployment Strategy

Service Separation

Deploy each Cadence service type separately for independent scaling:
  • Frontend: User-facing API endpoints
  • History: Workflow execution engines
  • Matching: Task list management
  • Worker: System workflows and replication

Namespace Design

Organize resources using Kubernetes namespaces:
kubectl create namespace cadence
kubectl create namespace cadence-system  # For monitoring, operators

Configuration with ConfigMaps

Store Cadence configuration in ConfigMaps:
configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: cadence-config
  namespace: cadence
data:
  config.yaml: |
    log:
      stdout: true
      level: info
    
    persistence:
      defaultStore: cass-default
      visibilityStore: cass-visibility
      numHistoryShards: 4096
      datastores:
        cass-default:
          nosql:
            pluginName: "cassandra"
            hosts: "cassandra.cadence.svc.cluster.local"
            keyspace: "cadence"
            consistency: LOCAL_QUORUM
        cass-visibility:
          nosql:
            pluginName: "cassandra"
            hosts: "cassandra.cadence.svc.cluster.local"
            keyspace: "cadence_visibility"
    
    ringpop:
      name: cadence
      bootstrapMode: dns
      bootstrapHosts:
        - "cadence-frontend-headless.cadence.svc.cluster.local:7933"
        - "cadence-history-headless.cadence.svc.cluster.local:7934"
        - "cadence-matching-headless.cadence.svc.cluster.local:7935"
      maxJoinDuration: 30s
    
    services:
      frontend:
        rpc:
          port: 7933
          grpcPort: 7833
          bindOnLocalHost: false
        metrics:
          prometheus:
            timerType: "histogram"
            listenAddress: "0.0.0.0:8000"
      
      history:
        rpc:
          port: 7934
          grpcPort: 7834
          bindOnLocalHost: false
        metrics:
          prometheus:
            timerType: "histogram"
            listenAddress: "0.0.0.0:8001"
      
      matching:
        rpc:
          port: 7935
          grpcPort: 7835
          bindOnLocalHost: false
        metrics:
          prometheus:
            timerType: "histogram"
            listenAddress: "0.0.0.0:8002"
      
      worker:
        rpc:
          port: 7939
          bindOnLocalHost: false
        metrics:
          prometheus:
            timerType: "histogram"
            listenAddress: "0.0.0.0:8003"
Apply the ConfigMap:
kubectl apply -f configmap.yaml

Secrets Management

Store sensitive credentials in Kubernetes Secrets:
secrets.yaml
apiVersion: v1
kind: Secret
metadata:
  name: cadence-secrets
  namespace: cadence
type: Opaque
stringData:
  cassandra-password: "your-secure-password"
  mysql-password: "your-secure-password"
  postgres-password: "your-secure-password"
Apply the Secret:
kubectl apply -f secrets.yaml
Never commit secrets to version control. Use tools like sealed-secrets, external-secrets, or cloud-native secret managers (AWS Secrets Manager, GCP Secret Manager, Azure Key Vault).

Frontend Deployment

Deploy the Frontend service with Deployment and Service:
frontend-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cadence-frontend
  namespace: cadence
  labels:
    app: cadence-frontend
spec:
  replicas: 3
  selector:
    matchLabels:
      app: cadence-frontend
  template:
    metadata:
      labels:
        app: cadence-frontend
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8000"
        prometheus.io/path: "/metrics"
    spec:
      containers:
      - name: cadence-frontend
        image: ubercadence/server:1.2.7
        ports:
        - containerPort: 7933
          name: tchannel
          protocol: TCP
        - containerPort: 7833
          name: grpc
          protocol: TCP
        - containerPort: 8000
          name: metrics
          protocol: TCP
        env:
        - name: SERVICES
          value: "frontend"
        - name: LOG_LEVEL
          value: "info"
        - name: CASSANDRA_SEEDS
          value: "cassandra.cadence.svc.cluster.local"
        - name: CASSANDRA_PASSWORD
          valueFrom:
            secretKeyRef:
              name: cadence-secrets
              key: cassandra-password
        - name: NUM_HISTORY_SHARDS
          value: "4096"
        - name: PROMETHEUS_ENDPOINT_0
          value: "0.0.0.0:8000"
        volumeMounts:
        - name: config
          mountPath: /etc/cadence/config
          readOnly: true
        livenessProbe:
          httpGet:
            path: /metrics
            port: 8000
          initialDelaySeconds: 60
          periodSeconds: 30
          timeoutSeconds: 5
        readinessProbe:
          httpGet:
            path: /metrics
            port: 8000
          initialDelaySeconds: 30
          periodSeconds: 10
        resources:
          requests:
            cpu: 500m
            memory: 1Gi
          limits:
            cpu: 2000m
            memory: 4Gi
      volumes:
      - name: config
        configMap:
          name: cadence-config
---
apiVersion: v1
kind: Service
metadata:
  name: cadence-frontend
  namespace: cadence
  labels:
    app: cadence-frontend
spec:
  type: ClusterIP
  ports:
  - port: 7933
    targetPort: 7933
    protocol: TCP
    name: tchannel
  - port: 7833
    targetPort: 7833
    protocol: TCP
    name: grpc
  selector:
    app: cadence-frontend
---
apiVersion: v1
kind: Service
metadata:
  name: cadence-frontend-headless
  namespace: cadence
  labels:
    app: cadence-frontend
spec:
  clusterIP: None
  ports:
  - port: 7933
    name: tchannel
  selector:
    app: cadence-frontend

History Deployment

Deploy History service as StatefulSet for stable network identities:
history-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: cadence-history
  namespace: cadence
spec:
  serviceName: cadence-history-headless
  replicas: 6
  selector:
    matchLabels:
      app: cadence-history
  template:
    metadata:
      labels:
        app: cadence-history
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8001"
    spec:
      containers:
      - name: cadence-history
        image: ubercadence/server:1.2.7
        ports:
        - containerPort: 7934
          name: tchannel
        - containerPort: 7834
          name: grpc
        - containerPort: 8001
          name: metrics
        env:
        - name: SERVICES
          value: "history"
        - name: LOG_LEVEL
          value: "info"
        - name: CASSANDRA_SEEDS
          value: "cassandra.cadence.svc.cluster.local"
        - name: NUM_HISTORY_SHARDS
          value: "4096"
        - name: PROMETHEUS_ENDPOINT_2
          value: "0.0.0.0:8001"
        volumeMounts:
        - name: config
          mountPath: /etc/cadence/config
        resources:
          requests:
            cpu: 1000m
            memory: 2Gi
          limits:
            cpu: 4000m
            memory: 8Gi
      volumes:
      - name: config
        configMap:
          name: cadence-config
---
apiVersion: v1
kind: Service
metadata:
  name: cadence-history-headless
  namespace: cadence
spec:
  clusterIP: None
  ports:
  - port: 7934
    name: tchannel
  - port: 7834
    name: grpc
  selector:
    app: cadence-history

Matching Deployment

matching-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cadence-matching
  namespace: cadence
spec:
  replicas: 3
  selector:
    matchLabels:
      app: cadence-matching
  template:
    metadata:
      labels:
        app: cadence-matching
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8002"
    spec:
      containers:
      - name: cadence-matching
        image: ubercadence/server:1.2.7
        ports:
        - containerPort: 7935
          name: tchannel
        - containerPort: 7835
          name: grpc
        - containerPort: 8002
          name: metrics
        env:
        - name: SERVICES
          value: "matching"
        - name: LOG_LEVEL
          value: "info"
        - name: PROMETHEUS_ENDPOINT_1
          value: "0.0.0.0:8002"
        volumeMounts:
        - name: config
          mountPath: /etc/cadence/config
        resources:
          requests:
            cpu: 500m
            memory: 1Gi
          limits:
            cpu: 2000m
            memory: 4Gi
      volumes:
      - name: config
        configMap:
          name: cadence-config
---
apiVersion: v1
kind: Service
metadata:
  name: cadence-matching-headless
  namespace: cadence
spec:
  clusterIP: None
  ports:
  - port: 7935
    name: tchannel
  selector:
    app: cadence-matching

Worker Deployment

worker-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cadence-worker
  namespace: cadence
spec:
  replicas: 2
  selector:
    matchLabels:
      app: cadence-worker
  template:
    metadata:
      labels:
        app: cadence-worker
    spec:
      containers:
      - name: cadence-worker
        image: ubercadence/server:1.2.7
        ports:
        - containerPort: 7939
          name: tchannel
        - containerPort: 8003
          name: metrics
        env:
        - name: SERVICES
          value: "worker"
        - name: LOG_LEVEL
          value: "info"
        - name: PROMETHEUS_ENDPOINT_3
          value: "0.0.0.0:8003"
        volumeMounts:
        - name: config
          mountPath: /etc/cadence/config
        resources:
          requests:
            cpu: 250m
            memory: 512Mi
          limits:
            cpu: 1000m
            memory: 2Gi
      volumes:
      - name: config
        configMap:
          name: cadence-config

Ingress Configuration

Expose Frontend service via Ingress:
ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: cadence-frontend-ingress
  namespace: cadence
  annotations:
    nginx.ingress.kubernetes.io/backend-protocol: "GRPC"
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - cadence.example.com
    secretName: cadence-tls
  rules:
  - host: cadence.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: cadence-frontend
            port:
              number: 7833

Horizontal Pod Autoscaling

Autoscale services based on CPU/memory:
hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: cadence-frontend-hpa
  namespace: cadence
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: cadence-frontend
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: cadence-matching-hpa
  namespace: cadence
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: cadence-matching
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
History service uses StatefulSet and requires careful scaling. Coordinate with shard distribution when scaling history pods.

Monitoring with Prometheus

Deploy ServiceMonitor for Prometheus Operator:
servicemonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: cadence-metrics
  namespace: cadence
  labels:
    app: cadence
spec:
  selector:
    matchLabels:
      app: cadence-frontend
  endpoints:
  - port: metrics
    interval: 30s
    path: /metrics
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: cadence-history-metrics
  namespace: cadence
spec:
  selector:
    matchLabels:
      app: cadence-history
  endpoints:
  - port: metrics
    interval: 30s

Production Best Practices

1

Use pod anti-affinity

Spread replicas across nodes:
affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
    - weight: 100
      podAffinityTerm:
        labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - cadence-frontend
        topologyKey: kubernetes.io/hostname
2

Configure resource quotas

apiVersion: v1
kind: ResourceQuota
metadata:
  name: cadence-quota
  namespace: cadence
spec:
  hard:
    requests.cpu: "20"
    requests.memory: 40Gi
    limits.cpu: "40"
    limits.memory: 80Gi
3

Enable pod disruption budgets

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: cadence-frontend-pdb
  namespace: cadence
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: cadence-frontend
4

Use network policies

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: cadence-network-policy
  namespace: cadence
spec:
  podSelector:
    matchLabels:
      app: cadence-frontend
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: ingress-nginx
    ports:
    - protocol: TCP
      port: 7833

Deployment Workflow

1

Deploy database (if self-hosted)

Use operators like:
  • K8ssandra for Cassandra
  • MySQL Operator
  • PostgreSQL Operator (Zalando, Crunchy)
2

Initialize database schemas

Run schema initialization as a Job:
apiVersion: batch/v1
kind: Job
metadata:
  name: cadence-schema-init
  namespace: cadence
spec:
  template:
    spec:
      containers:
      - name: schema-init
        image: ubercadence/server:1.2.7
        command:
        - /bin/sh
        - -c
        - |
          cadence-cassandra-tool \
            --ep cassandra.cadence.svc.cluster.local \
            create --keyspace cadence
          cadence-cassandra-tool \
            --ep cassandra.cadence.svc.cluster.local \
            --keyspace cadence \
            setup-schema --version 0.0
      restartPolicy: OnFailure
3

Deploy ConfigMap and Secrets

kubectl apply -f configmap.yaml
kubectl apply -f secrets.yaml
4

Deploy Cadence services

kubectl apply -f frontend-deployment.yaml
kubectl apply -f history-statefulset.yaml
kubectl apply -f matching-deployment.yaml
kubectl apply -f worker-deployment.yaml
5

Configure monitoring

kubectl apply -f servicemonitor.yaml
kubectl apply -f hpa.yaml
6

Verify deployment

kubectl get pods -n cadence
kubectl logs -n cadence -l app=cadence-frontend

Troubleshooting

Pods Not Starting

# Check pod status
kubectl get pods -n cadence

# Describe pod for events
kubectl describe pod <pod-name> -n cadence

# Check logs
kubectl logs <pod-name> -n cadence

# Check previous logs if pod crashed
kubectl logs <pod-name> -n cadence --previous

Service Discovery Issues

# Test DNS resolution
kubectl run -it --rm debug --image=busybox --restart=Never -- \
  nslookup cadence-frontend.cadence.svc.cluster.local

# Check service endpoints
kubectl get endpoints -n cadence

Database Connectivity

# Test from pod
kubectl exec -it <pod-name> -n cadence -- \
  telnet cassandra.cadence.svc.cluster.local 9042

Helm Chart (Community)

While there’s no official Helm chart, community charts are available. Basic structure:
values.yaml
image:
  repository: ubercadence/server
  tag: 1.2.7
  pullPolicy: IfNotPresent

frontend:
  replicaCount: 3
  resources:
    requests:
      cpu: 500m
      memory: 1Gi

history:
  replicaCount: 6
  resources:
    requests:
      cpu: 1000m
      memory: 2Gi

matching:
  replicaCount: 3
  resources:
    requests:
      cpu: 500m
      memory: 1Gi

worker:
  replicaCount: 2

cassandra:
  enabled: false
  external:
    hosts:
      - cassandra.cadence.svc.cluster.local

Next Steps

Configuration

Fine-tune your Cadence configuration

Server Setup

Learn about Cadence architecture

Build docs developers (and LLMs) love