Overview
The K8s Scheduler operator is a Kubernetes controller that reconciles custom resources (CRDs) for user deployments, ephemeral agent tasks, and multi-step workflows. It uses controller-runtime and follows Kubernetes operator best practices.Architecture
Operator Entry Point
The operator is initialized incmd/operator/main.go:
cmd/operator/main.go:33-155
cmd/operator/main.go:33-155
Custom Resource Definitions
The operator manages three CRDs defined ininternal/operator/apis/scheduler/v1alpha1/types.go:
UserDeployment CRD
Represents a user’s long-lived deployment (web app, API, database, etc.).internal/operator/apis/scheduler/v1alpha1/types.go:16-23
internal/operator/apis/scheduler/v1alpha1/types.go:16-23
Spec Fields
internal/operator/apis/scheduler/v1alpha1/types.go:25-74
internal/operator/apis/scheduler/v1alpha1/types.go:25-74
Status Fields
internal/operator/apis/scheduler/v1alpha1/types.go:119-135
internal/operator/apis/scheduler/v1alpha1/types.go:119-135
Phase Constants
internal/operator/apis/scheduler/v1alpha1/types.go:152-159
internal/operator/apis/scheduler/v1alpha1/types.go:152-159
AgentTask CRD
Represents an ephemeral agent execution (short-lived task).internal/operator/apis/scheduler/v1alpha1/types.go:190-197
internal/operator/apis/scheduler/v1alpha1/types.go:190-197
AgentTask Spec
internal/operator/apis/scheduler/v1alpha1/types.go:199-227
internal/operator/apis/scheduler/v1alpha1/types.go:199-227
Workflow CRD
Orchestrates sequential multi-step agent executions.internal/operator/apis/scheduler/v1alpha1/types.go:550-556
internal/operator/apis/scheduler/v1alpha1/types.go:550-556
UserDeployment Controller
The UserDeployment controller reconciles user deployments by creating/updating Kubernetes resources.Controller Structure
internal/operator/controller/userdeployment_controller.go:64-74
internal/operator/controller/userdeployment_controller.go:64-74
Reconciliation Loop
internal/operator/controller/userdeployment_controller.go:281-310
internal/operator/controller/userdeployment_controller.go:281-300
Reconciliation Steps
- Fetch UserDeployment CR from Kubernetes API
- Check desired state:
- If
deleted: Add finalizer and clean up resources - If
running: Provision/update resources
- If
- Load template from ConfigMap
- Create namespace (if needed)
- Create ConfigMaps for user configuration
- Create ExternalSecrets for secrets injection
- Create Deployments for each service
- Create Services for networking
- Create Ingresses (Traefik IngressRoute)
- Create NetworkPolicies for isolation
- Update status with phase and ingress URLs
Resource Creation
Deployment Creation
The controller creates Kubernetes Deployments for each service in the template:Service Creation
Kubernetes Services are created for each service:Ingress Creation
Traefik IngressRoutes are created for public/internal access:Secrets Integration
The controller creates ExternalSecret resources for Vault/AWS secrets:Finalizer Pattern
The controller uses finalizers for cleanup:AgentTask Controller
The AgentTask controller manages ephemeral agent executions by creating Kubernetes Jobs.Job Creation
The controller creates a Job for each AgentTask:Status Syncing
The controller watches Job/Pod status and updates the AgentTask CR:- Pending: Job created, waiting for Pod
- Running: Pod is running
- Succeeded: Pod completed with exit code 0
- Failed: Pod completed with non-zero exit code
- Timeout: Execution exceeded timeout limit
Workflow Controller
The Workflow controller orchestrates sequential AgentTask executions.Step Execution
The controller:- Creates AgentTask CR for current step
- Waits for step completion
- Captures step output
- Substitutes output into next step’s input (supports
${steps.N.output.KEY}syntax) - Advances to next step
- Repeats until all steps complete or one fails
Workflow Status
internal/operator/apis/scheduler/v1alpha1/types.go:592-611
RBAC Permissions
The operator requires these Kubernetes permissions:internal/operator/controller/userdeployment_controller.go:267-279
Health Checks
The operator exposes health endpoints:GET :8081/healthz- Liveness probeGET :8081/readyz- Readiness probe
Metrics
Prometheus metrics are exposed on the configured metrics port (default: disabled).Leader Election
The operator supports leader election for high availability:Configuration
Command-Line Flags
cmd/operator/main.go:34-51
Environment Variables
DEPLOYMENT_DOMAIN- Domain for deployment ingressesCLUSTER_SECRET_STORE- External Secrets ClusterSecretStore nameTASK_NAMESPACE- Namespace for ephemeral tasksDISABLE_NETWORK_POLICIES- Skip network policy creation (dev mode)
Deployment
The operator is deployed as a Kubernetes Deployment with:- Service account with RBAC permissions
- Leader election for HA
- Health probes for liveness/readiness
- Metrics endpoint (optional)
Related Documentation
Frontend Architecture
React 19 frontend with TypeScript and TanStack Query
Server Architecture
Go backend with HTTP handlers and middleware