Skip to main content
This guide covers deploying k8s-scheduler to a production Kubernetes cluster using Helm or raw manifests.

Overview

k8s-scheduler consists of two main components:
  • Server: Go backend serving REST API and React SPA
  • Operator: Kubernetes operator managing UserDeployment custom resources

Deployment Flow

Infrastructure Layers

LayerComponents
1-infrastructure/VPC, EKS, RDS PostgreSQL, Vault, Tailscale
2-platform/ALB Controller, cert-manager, External Secrets Operator
3-apps/Traefik, external-dns, ArgoCD, Vault secrets
See the opsnorth/infra repository for full Terraform configuration.

Prerequisites

Before deploying k8s-scheduler, ensure the following components are installed:
  • PostgreSQL - Database for users, orgs, teams, deployments
  • AWS ALB Controller - Creates ALB from Ingress resources (on AWS)
  • Traefik - Routes wildcard traffic to user deployments
  • External DNS - Auto-creates DNS records from ingress
  • cert-manager - TLS certificates for server and deployments
  • Cloudflare - DNS provider for External DNS and cert-manager
  • Stripe - Subscription billing
  • SendGrid/SMTP - Team invitation emails
Verify all components are running:
# AWS Load Balancer Controller
kubectl get pods -n kube-system | grep aws-load-balancer

# External Secrets Operator
kubectl get pods -n external-secrets

# Vault + Agent Injector
kubectl get pods -n vault

# Traefik
kubectl get pods -n traefik

# cert-manager
kubectl get pods -n cert-manager

# external-dns
kubectl get pods -n external-dns

Step 1: Configure Terraform

1

Clone infrastructure repository

git clone https://github.com/opsnorth/infra.git
cd infra
2

Copy environment template

cp .env.example .env
The .env file contains all secrets and credentials (gitignored).
3

Configure secrets

Edit .env with your credentials:
.env
# Tailscale VPN
export TF_VAR_tailscale_auth_key="tskey-auth-..."

# Cloudflare DNS
export TF_VAR_cloudflare_api_token="..."

# GitHub App (for ArgoCD)
export TF_VAR_github_app_id="..."
export TF_VAR_github_app_installation_id="..."
export TF_VAR_github_app_private_key_file="~/.github/github-app.pem"

# Google OAuth
export TF_VAR_google_client_id="your-client-id.apps.googleusercontent.com"
export TF_VAR_google_client_secret="your-client-secret"
4

Deploy infrastructure

./deploy.sh
This script sources .env and applies Terraform for all three layers.

Step 2: Setup Vault Policy

This is a one-time setup. Required before deploying the application.
The setup script creates:
  • Vault policy k8s-scheduler - grants access to user secret paths
  • Kubernetes auth role k8s-scheduler - binds policy to service account
./scripts/setup-vault.sh
Prerequisite: Create ~/.vault-secrets/vault.env with your Vault token:
~/.vault-secrets/vault.env
VAULT_TOKEN=hvs.your-vault-root-token

Vault Secret Paths

Terraform writes these paths automatically:
PathKeysRequired?
secret/k8s-scheduler/databaseconnection_stringYes
secret/k8s-scheduler/googleclient_id, client_secretYes (unless DEV_MODE)
secret/k8s-scheduler/emailprovider, smtp_host, smtp_port, smtp_user, smtp_password, smtp_fromNo
secret/k8s-scheduler/aianthropic_api_keyNo
secret/k8s-scheduler/stripeapi_key, webhook_secretNo
secret/k8s-scheduler/secretskeycloak_admin_password, grafana_cloud_prometheus_password, grafana_cloud_loki_passwordNo
All paths must exist in Vault even if empty. The Vault Agent template will fail if a path is missing.
vault kv put secret/k8s-scheduler/ai anthropic_api_key=""
vault kv put secret/k8s-scheduler/stripe api_key="" webhook_secret=""

Step 3: Deploy with Helm

Install

helm install k8s-scheduler ./charts/k8s-scheduler \
  -n scheduler-system --create-namespace \
  --set domain=yourdomain.com
The Helm chart deploys:
  • Custom Resource Definitions (UserDeployment, AgentTask, Workflow)
  • RBAC (ServiceAccounts, ClusterRole, ClusterRoleBinding)
  • Server deployment with Vault Agent sidecar
  • Operator deployment
  • ClusterSecretStore for External Secrets Operator
  • Ingress (ALB) for the server
  • ConfigMaps for templates

Helm Values

View all available configuration options:
helm show values ./charts/k8s-scheduler
helm install k8s-scheduler ./charts/k8s-scheduler \
  -n scheduler-system --create-namespace \
  --set domain=yourdomain.com

Key Helm Values

ValueDescriptionDefault
domainRequired. Base domain for the appexample.com
image.server.repositoryServer container imageghcr.io/opsnorth/k8s-scheduler-server
image.operator.repositoryOperator container imageghcr.io/opsnorth/k8s-scheduler-operator
image.server.tagServer image taglatest
server.replicasServer pod count1
operator.replicasOperator pod count1
operator.leaderElectEnable leader election for HAtrue
ingress.enabledCreate ALB Ingresstrue
ingress.classNameIngress classalb
vault.agentInjectEnable Vault Agent sidecartrue
vault.addressVault server URLhttp://vault.vault.svc.cluster.local:8200
secretStore.enabledCreate ClusterSecretStoretrue
secretStore.nameClusterSecretStore namevault-backend
session.backendSession storage backendpostgres

Step 4: Deploy with Raw Manifests

Helm is the recommended deployment method. Use raw manifests only for advanced customization.
kubectl apply -k manifests/
Manifests are organized by component:
manifests/
├── namespace.yaml
├── crds/
│   ├── userdeployment-crd.yaml
│   ├── agenttask-crd.yaml
│   └── workflow-crd.yaml
├── rbac/
│   ├── service-account.yaml
│   ├── server-service-account.yaml
│   ├── cluster-role.yaml
│   └── cluster-role-binding.yaml
├── configmaps/
│   ├── server-config.yaml
│   └── deployment-templates.yaml
├── secrets/
│   └── cluster-secret-store.yaml
├── deployments/
│   ├── server.yaml
│   └── operator.yaml
└── kustomization.yaml

Step 5: Verify Deployment

1

Check pod status

kubectl get pods -n scheduler-system
Expected output:
NAME                                      READY   STATUS    RESTARTS   AGE
k8s-scheduler-operator-xxxxx-xxxxx        1/1     Running   0          30s
k8s-scheduler-server-xxxxx-xxxxx          2/2     Running   0          30s
The server pod shows 2/2 because Vault Agent runs as a sidecar.
2

Check server logs

kubectl logs -n scheduler-system -l app=k8s-scheduler-server -c server --tail=50
3

Check operator logs

kubectl logs -n scheduler-system -l app=k8s-scheduler-operator --tail=50
4

Access the application

Application is available at:
https://app.<your-domain>

Testing

Create a test deployment to verify the operator:
test-deployment.yaml
apiVersion: scheduler.opsnorth.io/v1alpha1
kind: UserDeployment
metadata:
  name: test-deployment
  namespace: scheduler-system
spec:
  userId: "test-user"
  template: "nginx"
  tier: "free"
  desiredState: "running"
# Create
kubectl apply -f test-deployment.yaml

# Watch status
kubectl get userdeployment test-deployment -n scheduler-system -w

# Cleanup
kubectl delete userdeployment test-deployment -n scheduler-system

Lifecycle Management

Upgrade

helm upgrade k8s-scheduler ./charts/k8s-scheduler \
  -n scheduler-system \
  --set domain=yourdomain.com

Rollback

# List revisions
helm history k8s-scheduler -n scheduler-system

# Rollback to specific revision
helm rollback k8s-scheduler <revision> -n scheduler-system

Uninstall

# Remove Helm release + CRDs + namespace
./scripts/uninstall.sh

# Full teardown including Vault policy
./scripts/uninstall.sh --all

Production Considerations

High Availability

Increase replicas for server and operator:
--set server.replicas=3 \
--set operator.replicas=2 \
--set operator.leaderElect=true

Resource Limits

Adjust based on load:
values.yaml
server:
  resources:
    requests:
      cpu: 500m
      memory: 512Mi
    limits:
      cpu: 2000m
      memory: 2Gi

Database

Use managed PostgreSQL:
  • AWS RDS
  • Google Cloud SQL
  • Azure Database for PostgreSQL
Enable SSL connections in production.

Session Store

Use persistent session backend:
--set session.backend=postgres
# or
--set session.backend=redis
Avoid memory backend in production.

TLS Certificates

cert-manager auto-provisions TLS:
  • Server ingress
  • User deployment ingresses
Configure DNS01 challenge with Cloudflare.

Monitoring

Enable Prometheus metrics:
--set monitoring.enabled=true
Requires Prometheus Operator.

Troubleshooting

Symptoms: Server pod stuck in Init:0/1 or CrashLoopBackOffCauses:
  1. Vault secrets missing or incorrect
  2. Database connection failed
  3. Vault Agent can’t authenticate
Solutions:
# Check Vault Agent logs
kubectl logs -n scheduler-system -l app=k8s-scheduler-server -c vault-agent

# Verify Vault secrets exist
vault kv get secret/k8s-scheduler/database
vault kv get secret/k8s-scheduler/google

# Check database connectivity
kubectl run -it --rm psql --image=postgres:15 -- psql $DATABASE_DSN
Symptoms: UserDeployment created but no pods/services appearCauses:
  1. RBAC permissions missing
  2. Operator not running
  3. CRD not installed
Solutions:
# Check operator logs
kubectl logs -n scheduler-system -l app=k8s-scheduler-operator

# Verify CRD exists
kubectl get crd userdeployments.scheduler.opsnorth.io

# Check ClusterRole
kubectl get clusterrole k8s-scheduler-operator
Symptoms: Ingress created but no ALB provisionedCauses:
  1. AWS Load Balancer Controller not running
  2. Incorrect ingress annotations
  3. IAM permissions missing
Solutions:
# Check ALB controller logs
kubectl logs -n kube-system -l app.kubernetes.io/name=aws-load-balancer-controller

# Verify ingress
kubectl describe ingress -n scheduler-system k8s-scheduler-server

# Check ALB controller IAM role
kubectl get sa -n kube-system aws-load-balancer-controller -o yaml
Symptoms: ExternalSecret shows SecretSyncedErrorCauses:
  1. ClusterSecretStore misconfigured
  2. Vault auth role missing
  3. Secret path doesn’t exist in Vault
Solutions:
# Check ClusterSecretStore
kubectl get clustersecretstore vault-backend -o yaml

# Verify Vault auth role
vault read auth/kubernetes/role/k8s-scheduler

# Check ESO logs
kubectl logs -n external-secrets -l app.kubernetes.io/name=external-secrets

Next Steps

Configuration

Configure environment variables and settings

Dependencies

Learn about platform dependencies

Build docs developers (and LLMs) love