Building Custom Infrastructure Providers

Overview

Datum’s plugin architecture allows you to extend the platform by building custom infrastructure provider plugins. Provider plugins are Kubernetes operators that reconcile Datum resources against any infrastructure platform or cloud provider.

Custom providers follow the same patterns as the official GCP provider, which serves as a reference implementation.

Use Cases for Custom Providers

Build custom provider plugins to:

Cloud Providers

Integrate with AWS, Azure, DigitalOcean, Vultr, NetActuate, or other cloud platforms

On-Premises Infrastructure

Manage bare metal servers, VMware vSphere, Proxmox, or OpenStack environments

Specialty Providers

Integrate with edge computing platforms, IoT device management, or GPU clusters

Internal Systems

Connect to internal provisioning systems, CMDBs, or legacy infrastructure

Architecture Overview

A Datum infrastructure provider plugin is a Kubernetes controller that:

Watches Datum resources like Workload, Network, and WorkloadDeployment
Interprets placement rules and resource specifications
Provisions infrastructure resources in the target platform
Reconciles the actual infrastructure state to match the desired state
Reports status back to Datum resources

Getting Started

Set up development environment

Install the required tools:

# Install Go 1.21 or later
go version

# Install Kubebuilder for scaffolding
curl -L -o kubebuilder https://go.kubebuilder.io/dl/latest/$(go env GOOS)/$(go env GOARCH)
chmod +x kubebuilder && mv kubebuilder /usr/local/bin/

# Install kubectl and kind for local testing
# Follow https://kubernetes.io/docs/tasks/tools/

Scaffold the provider project

Create a new Kubernetes operator project:

mkdir infra-provider-mycloud
cd infra-provider-mycloud

# Initialize a new Kubebuilder project
kubebuilder init \
  --domain datum.net \
  --repo github.com/myorg/infra-provider-mycloud

Define custom resources

Create API types for provider-specific configuration:

# Create ProviderConfig CRD
kubebuilder create api \
  --group mycloud \
  --version v1alpha1 \
  --kind ProviderConfig \
  --resource --controller

Implement reconciliation logic

Add controllers to watch Datum resources and reconcile infrastructure.See the implementation guide below for details.

Implementation Guide

1. Watch Datum Resources

Your provider needs to watch for Datum’s core resources:

package controller

import (
    workloadv1alpha1 "github.com/datum-cloud/workload-operator/api/v1alpha1"
    networkv1alpha1 "github.com/datum-cloud/network-services-operator/api/v1alpha1"
    ctrl "sigs.k8s.io/controller-runtime"
)

func (r *WorkloadReconciler) SetupWithManager(mgr ctrl.Manager) error {
    return ctrl.NewControllerManagedBy(mgr).
        For(&workloadv1alpha1.WorkloadDeployment{}).
        Owns(&workloadv1alpha1.Instance{}).
        Complete(r)
}

2. Implement Workload Reconciliation

Handle workload provisioning in your target infrastructure:

func (r *WorkloadReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    // Fetch the WorkloadDeployment
    var deployment workloadv1alpha1.WorkloadDeployment
    if err := r.Get(ctx, req.NamespacedName, &deployment); err != nil {
        return ctrl.Result{}, client.IgnoreNotFound(err)
    }

    // Get provider configuration
    providerConfig, err := r.getProviderConfig(ctx, &deployment)
    if err != nil {
        return ctrl.Result{}, err
    }

    // Initialize infrastructure client
    infraClient, err := r.newInfraClient(providerConfig)
    if err != nil {
        return ctrl.Result{}, err
    }

    // Reconcile instances
    if deployment.Spec.Template.Spec.Containers != nil {
        return r.reconcileContainerInstances(ctx, &deployment, infraClient)
    } else {
        return r.reconcileVMInstances(ctx, &deployment, infraClient)
    }
}

3. Handle Container Instances

Provision container-based workloads:

func (r *WorkloadReconciler) reconcileContainerInstances(
    ctx context.Context,
    deployment *workloadv1alpha1.WorkloadDeployment,
    client InfraClient,
) (ctrl.Result, error) {
    
    // Extract container specs
    containers := deployment.Spec.Template.Spec.Containers
    
    // Create instances in infrastructure
    for i := 0; i < deployment.Spec.Replicas; i++ {
        instanceName := fmt.Sprintf("%s-%d", deployment.Name, i)
        
        // Check if instance already exists
        existing, err := client.GetInstance(ctx, instanceName)
        if err == nil {
            // Update existing instance if needed
            if needsUpdate(existing, containers) {
                err = client.UpdateInstance(ctx, instanceName, containers)
            }
        } else {
            // Create new instance
            err = client.CreateContainerInstance(ctx, instanceName, containers)
        }
        
        if err != nil {
            return ctrl.Result{}, err
        }
        
        // Create Datum Instance resource
        instance := &workloadv1alpha1.Instance{
            ObjectMeta: metav1.ObjectMeta{
                Name:      instanceName,
                Namespace: deployment.Namespace,
            },
            Spec: workloadv1alpha1.InstanceSpec{
                WorkloadRef: deployment.Name,
                Provider:    "mycloud",
            },
        }
        
        if err := r.Create(ctx, instance); err != nil {
            return ctrl.Result{}, err
        }
    }
    
    return ctrl.Result{}, nil
}

4. Handle VM Instances

Provision virtual machine instances:

func (r *WorkloadReconciler) reconcileVMInstances(
    ctx context.Context,
    deployment *workloadv1alpha1.WorkloadDeployment,
    client InfraClient,
) (ctrl.Result, error) {
    
    // Extract VM specs
    image := deployment.Spec.Template.Spec.Image
    machineType := deployment.Spec.Template.Spec.MachineType
    
    // Create VM instances
    for i := 0; i < deployment.Spec.Replicas; i++ {
        instanceName := fmt.Sprintf("%s-%d", deployment.Name, i)
        
        vmSpec := &VMSpec{
            Name:        instanceName,
            Image:       image,
            MachineType: machineType,
            Networks:    deployment.Spec.Template.Spec.Networks,
        }
        
        // Provision VM in infrastructure
        vmID, err := client.CreateVM(ctx, vmSpec)
        if err != nil {
            return ctrl.Result{}, err
        }
        
        // Create Datum Instance resource with provider-specific ID
        instance := &workloadv1alpha1.Instance{
            ObjectMeta: metav1.ObjectMeta{
                Name:      instanceName,
                Namespace: deployment.Namespace,
            },
            Spec: workloadv1alpha1.InstanceSpec{
                WorkloadRef: deployment.Name,
                Provider:    "mycloud",
            },
            Status: workloadv1alpha1.InstanceStatus{
                ProviderID: vmID,
                Phase:      "Running",
            },
        }
        
        if err := r.Create(ctx, instance); err != nil {
            return ctrl.Result{}, err
        }
    }
    
    return ctrl.Result{}, nil
}

5. Implement Network Reconciliation

Handle VPC/network creation:

func (r *NetworkReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    var network networkv1alpha1.Network
    if err := r.Get(ctx, req.NamespacedName, &network); err != nil {
        return ctrl.Result{}, client.IgnoreNotFound(err)
    }

    // Create VPC in infrastructure
    vpcID, err := r.infraClient.CreateVPC(ctx, &VPCSpec{
        Name: network.Name,
        CIDR: network.Spec.CIDR,
    })
    if err != nil {
        return ctrl.Result{}, err
    }

    // Create subnets
    for _, subnet := range network.Spec.Subnets {
        _, err := r.infraClient.CreateSubnet(ctx, &SubnetSpec{
            Name:   subnet.Name,
            VPCID:  vpcID,
            CIDR:   subnet.CIDR,
            Region: subnet.Region,
        })
        if err != nil {
            return ctrl.Result{}, err
        }
    }

    // Update network status
    network.Status.ProviderID = vpcID
    network.Status.Phase = "Ready"
    return ctrl.Result{}, r.Status().Update(ctx, &network)
}

Provider Configuration

Define a ProviderConfig custom resource for provider-specific settings:

type ProviderConfigSpec struct {
    // Provider-specific credentials
    Credentials CredentialsSpec `json:"credentials"`
    
    // Default region
    Region string `json:"region,omitempty"`
    
    // Provider-specific configuration
    // For AWS: accountID, vpcID
    // For Azure: subscriptionID, resourceGroup
    Config map[string]string `json:"config,omitempty"`
}

type CredentialsSpec struct {
    Source string `json:"source"` // "Secret" or "InjectedIdentity"
    
    SecretRef *SecretReference `json:"secretRef,omitempty"`
}

Users configure the provider with:

apiVersion: mycloud.datum.net/v1alpha1
kind: ProviderConfig
metadata:
  name: default
spec:
  credentials:
    source: Secret
    secretRef:
      name: mycloud-credentials
      namespace: datum-system
      key: credentials.json
  region: us-west-2
  config:
    accountID: "123456789012"

Testing Your Provider

Unit tests

Write unit tests for reconciliation logic:

func TestWorkloadReconcile(t *testing.T) {
    // Use controller-runtime's envtest
    // Test workload creation, updates, and deletion
}

Integration tests

Test against a real Kubernetes cluster:

# Create a kind cluster
kind create cluster

# Install CRDs
make install

# Run the controller locally
make run

# In another terminal, apply test resources
kubectl apply -f examples/workload.yaml

End-to-end tests

Verify full integration with your infrastructure platform:

# Deploy provider to cluster
make deploy

# Apply test workloads and verify infrastructure is created
kubectl apply -f examples/

# Check instance status
kubectl get instances -A

Deployment

Package your provider as a Kubernetes deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mycloud-provider
  namespace: datum-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mycloud-provider
  template:
    metadata:
      labels:
        app: mycloud-provider
    spec:
      serviceAccountName: mycloud-provider
      containers:
        - name: manager
          image: myorg/infra-provider-mycloud:v1.0.0
          command:
            - /manager
          args:
            - --leader-elect
          env:
            - name: PROVIDER_NAME
              value: mycloud

Best Practices

Handle errors gracefully

Return errors to trigger reconciliation retries
Use exponential backoff for transient failures
Update resource status with error messages
Log detailed error information for debugging

Implement idempotent operations

Check if resources already exist before creating
Support updates to existing infrastructure
Handle partial failures gracefully
Use provider-specific IDs to track resources

Report accurate status

Update Datum resource status with provider IDs
Report instance health and readiness
Include IP addresses and network information
Set appropriate conditions for observability

Support cleanup

Implement finalizers to clean up infrastructure
Delete provider resources when Datum resources are deleted
Handle cascading deletions correctly
Prevent orphaned resources in the infrastructure

Reference Implementation

The official GCP provider serves as a complete reference implementation. Study its code to understand:

Project structure and organization
API definitions and CRDs
Controller reconciliation patterns
Status reporting and error handling
Testing strategies
Deployment configurations

Next Steps

GCP Provider Source

Study the official GCP provider implementation

Workload Operator

Understand the Workload API your provider implements

Network Services Operator

Learn about Network resource management

Enhancements

Propose new provider features or integrations

Infrastructure Providers

Network Services

​Overview

​Use Cases for Custom Providers

Cloud Providers

On-Premises Infrastructure

Specialty Providers

Internal Systems

​Architecture Overview

​Getting Started

​Implementation Guide

​1. Watch Datum Resources

​2. Implement Workload Reconciliation

​3. Handle Container Instances

​4. Handle VM Instances

​5. Implement Network Reconciliation

​Provider Configuration

​Testing Your Provider

​Deployment

​Best Practices

​Reference Implementation

​Next Steps

GCP Provider Source

Workload Operator

Network Services Operator

Enhancements

Build docs developers (and LLMs) love

Overview

Use Cases for Custom Providers

Architecture Overview

Getting Started

Implementation Guide

1. Watch Datum Resources

2. Implement Workload Reconciliation

3. Handle Container Instances

4. Handle VM Instances

5. Implement Network Reconciliation

Provider Configuration

Testing Your Provider

Deployment

Best Practices

Reference Implementation

Next Steps