Skip to main content

Kubernetes Cluster Setup

Datum Cloud runs on Kubernetes and requires a properly configured cluster. This guide covers cluster requirements and setup instructions for various environments.

Requirements

Minimum Requirements

Kubernetes Version

v1.28 or laterDatum uses features from recent Kubernetes releases.

Cluster Resources

Minimum:
  • 2 CPU cores
  • 4 GB RAM
  • 20 GB disk
Recommended:
  • 4 CPU cores
  • 8 GB RAM
  • 50 GB disk

Node Count

Minimum: 1 nodeRecommended: 3+ nodes for HA

Network

  • Pod network (CNI plugin)
  • LoadBalancer support (for Gateways)
  • Network policy support (optional)

Required Features

  • Custom Resource Definitions (CRDs): Datum extends Kubernetes with custom resources
  • RBAC: Role-based access control must be enabled
  • Admission Webhooks: For validation and mutation (optional but recommended)
  • Persistent Storage: For stateful workloads (optional)

Supported Environments

Datum can run in various Kubernetes environments:

Google Kubernetes Engine (GKE)

# Create GKE cluster
gcloud container clusters create datum-cluster \
  --zone us-central1-a \
  --machine-type n2-standard-4 \
  --num-nodes 3 \
  --enable-autoscaling \
  --min-nodes 3 \
  --max-nodes 10 \
  --enable-network-policy \
  --release-channel regular

# Get credentials
gcloud container clusters get-credentials datum-cluster --zone us-central1-a

Amazon EKS

# Create EKS cluster using eksctl
eksctl create cluster \
  --name datum-cluster \
  --version 1.28 \
  --region us-west-2 \
  --nodegroup-name standard-workers \
  --node-type t3.large \
  --nodes 3 \
  --nodes-min 3 \
  --nodes-max 10 \
  --managed

# Update kubeconfig
aws eks update-kubeconfig --name datum-cluster --region us-west-2

Azure AKS

# Create resource group
az group create --name datum-rg --location eastus

# Create AKS cluster
az aks create \
  --resource-group datum-rg \
  --name datum-cluster \
  --node-count 3 \
  --node-vm-size Standard_DS3_v2 \
  --enable-cluster-autoscaler \
  --min-count 3 \
  --max-count 10 \
  --network-plugin azure \
  --generate-ssh-keys

# Get credentials
az aks get-credentials --resource-group datum-rg --name datum-cluster

Cluster Verification

Verify your cluster is ready for Datum:

Check Cluster Info

kubectl cluster-info
Expected output:
Kubernetes control plane is running at https://...
CoreDNS is running at https://.../api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

Check Node Status

kubectl get nodes
All nodes should be Ready:
NAME         STATUS   ROLES           AGE   VERSION
node-1       Ready    control-plane   10m   v1.28.0
node-2       Ready    <none>          9m    v1.28.0
node-3       Ready    <none>          9m    v1.28.0

Check System Pods

kubectl get pods -n kube-system
All pods should be Running:
NAME                              READY   STATUS    RESTARTS   AGE
coredns-...                       1/1     Running   0          10m
etcd-...                          1/1     Running   0          10m
kube-apiserver-...                1/1     Running   0          10m
kube-controller-manager-...       1/1     Running   0          10m
kube-proxy-...                    1/1     Running   0          10m
kube-scheduler-...                1/1     Running   0          10m

Verify RBAC

kubectl auth can-i create deployments --all-namespaces
kubectl auth can-i create customresourcedefinitions
Both should return yes.

Check API Server Version

kubectl version --short
Ensure server version is v1.28 or later.

Post-Installation Configuration

Install LoadBalancer (if not available)

For local clusters without native LoadBalancer support:
# Install MetalLB
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.13.0/config/manifests/metallb-native.yaml

# Configure IP address pool
cat <<EOF | kubectl apply -f -
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: default
  namespace: metallb-system
spec:
  addresses:
    - 192.168.1.240-192.168.1.250
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: default
  namespace: metallb-system
EOF

Install Metrics Server (optional)

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Enable Network Policies (optional)

Network policies provide additional security. If your CNI supports them:
# Verify support
kubectl get networkpolicies -A

# No error means network policies are supported

Resource Limits

Configure resource limits for the cluster:

Controller Manager Resources

From config/manager/manager.yaml:130:
resources:
  limits:
    cpu: 500m
    memory: 128Mi
  requests:
    cpu: 10m
    memory: 64Mi
Adjust based on workload:
kubectl edit deployment datum-controller-manager -n datum-system

High Availability Setup

For production deployments, configure HA:

Multiple Control Plane Nodes

# GKE (automatic)
gcloud container clusters create datum-cluster \
  --num-nodes 3 \
  --enable-autoscaling

# EKS (automatic with managed node groups)
eksctl create cluster --name datum-cluster --nodes 3

# kubeadm (manual)
# Use --control-plane-endpoint and join additional control plane nodes

etcd Backup

# Snapshot etcd
ETCDCTL_API=3 etcdctl snapshot save snapshot.db \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key

Multiple Controller Replicas

Datum uses leader election (configured in config/manager/manager.yaml:76), so you can safely run multiple replicas:
kubectl scale deployment datum-controller-manager --replicas=3 -n datum-system

Security Hardening

Pod Security Standards

From config/manager/manager.yaml:43:
securityContext:
  runAsNonRoot: true
  seccompProfile:
    type: RuntimeDefault

Network Policies

# Apply network policies
kubectl apply -k config/network-policy

RBAC

Datum follows least-privilege RBAC:
# Review RBAC
kubectl get clusterroles | grep datum
kubectl describe clusterrole datum-manager-role

Troubleshooting

Nodes not Ready

# Check node status
kubectl describe node <node-name>

# Check kubelet logs
sudo journalctl -u kubelet -n 100

# Check CNI plugin
kubectl get pods -n kube-system -l k8s-app=calico-node

DNS not working

# Check CoreDNS
kubectl get pods -n kube-system -l k8s-app=kube-dns

# Test DNS
kubectl run -it --rm debug --image=busybox --restart=Never -- nslookup kubernetes.default

Insufficient resources

# Check node resources
kubectl describe nodes

# Check pod resource requests
kubectl get pods -A -o custom-columns=NAME:.metadata.name,CPU_REQ:.spec.containers[*].resources.requests.cpu,MEM_REQ:.spec.containers[*].resources.requests.memory

Next Steps

Installation

Install Datum components

Configuration

Configure Datum for your environment

Security

Security best practices

Monitoring

Set up monitoring and observability

Build docs developers (and LLMs) love