Skip to main content
The Kingdom runs several CronJob deployments that perform scheduled maintenance and cleanup operations. These background jobs help maintain system health and enforce data retention policies.

Overview

Kingdom daemons are implemented as Kubernetes CronJobs that run on a scheduled basis. They communicate with the Kingdom Data Server to perform database operations.

Completed Measurements Deletion

Removes old completed measurements based on TTL

Pending Measurements Cancellation

Cancels stale pending measurements

Exchanges Deletion

Cleans up old panel exchange data

Completed Measurements Deletion

Image: kingdom/completed-measurements-deletion
Schedule: 15 * * * * (Hourly, 15 minutes past the hour)
CronJob Name: completed-measurements-deletion

Purpose

This job periodically deletes completed measurements that have exceeded their time-to-live (TTL) threshold. This helps:
  • Reduce database storage costs
  • Maintain query performance
  • Comply with data retention policies
  • Remove unnecessary historical data

Configuration Parameters

Flag: --time-to-live=180d
Default: 180 days
Description: How long to retain completed measurements before deletion.
Measurements in terminal states (SUCCEEDED, FAILED, CANCELLED) older than this threshold are eligible for deletion.
Flag: --max-to-delete-per-rpc=25
Default: 25
Description: Maximum number of measurements to delete in a single RPC call.
This prevents overwhelming the database with large batch deletes and provides rate limiting.
Flag: --dry-run=false
Default: false
Description: When enabled, logs which measurements would be deleted without actually deleting them.
Useful for testing retention policies before applying them.

Operation

  1. Query: Identifies completed measurements older than TTL
  2. Batch: Groups deletions into batches of max-to-delete-per-rpc
  3. Delete: Removes measurements via the Kingdom Data Server API
  4. Log: Records deletion operations for audit purposes

Example Configuration

schedule: "15 * * * *"  # Every hour at :15
args:
  - --internal-api-target=gcp-kingdom-data-server:8443
  - --internal-api-cert-host=localhost
  - --tls-cert-file=/var/run/secrets/files/kingdom_tls.pem
  - --tls-key-file=/var/run/secrets/files/kingdom_tls.key
  - --cert-collection-file=/var/run/secrets/files/all_root_certs.pem
  - --time-to-live=180d
  - --max-to-delete-per-rpc=25
  - --dry-run=false
The hourly schedule ensures regular cleanup without creating excessive database load. The :15 timing is offset from other jobs to distribute load.

Pending Measurements Cancellation

Image: kingdom/pending-measurements-cancellation
Schedule: 45 * * * * (Hourly, 45 minutes past the hour)
CronJob Name: pending-measurements-cancellation

Purpose

This job automatically cancels measurements that have been stuck in pending states for too long. This prevents:
  • Resource leaks from abandoned measurements
  • Confusion from stale pending measurements
  • Indefinite waiting on failed or unresponsive participants

Configuration Parameters

Flag: --time-to-live=15d
Default: 15 days
Description: How long to keep measurements in pending states before cancellation.
Measurements that have been pending longer than this duration are automatically cancelled.
Flag: --dry-run=false
Default: false
Description: When enabled, logs which measurements would be cancelled without actually cancelling them.

Pending States

The job targets measurements in non-terminal states such as:
  • PENDING_REQUISITION_PARAMS: Waiting for requisition parameters
  • PENDING_REQUISITION_FULFILLMENT: Waiting for EDP data
  • PENDING_PARTICIPANT_CONFIRMATION: Waiting for duchy confirmation
  • PENDING_COMPUTATION: Queued but not yet computing

Operation

  1. Identify: Finds measurements in pending states older than TTL
  2. Validate: Confirms measurements are truly stale (not just slow)
  3. Cancel: Transitions measurements to CANCELLED state
  4. Notify: May trigger notifications to measurement requestors
  5. Log: Records cancellation for audit and debugging

Example Configuration

schedule: "45 * * * *"  # Every hour at :45
args:
  - --internal-api-target=gcp-kingdom-data-server:8443
  - --internal-api-cert-host=localhost
  - --tls-cert-file=/var/run/secrets/files/kingdom_tls.pem
  - --tls-key-file=/var/run/secrets/files/kingdom_tls.key
  - --cert-collection-file=/var/run/secrets/files/all_root_certs.pem
  - --time-to-live=15d
  - --dry-run=false
The 15-day TTL is shorter than the completed measurements TTL (180d) because pending measurements consume active resources and should be resolved or cancelled more quickly.

Exchanges Deletion

Image: kingdom/exchanges-deletion
Schedule: 40 6 * * * (Daily at 6:40 AM)
CronJob Name: exchanges-deletion

Purpose

This job cleans up old panel exchange data to:
  • Remove completed exchange workflows
  • Free up storage from exchange intermediate data
  • Maintain manageable exchange history
  • Comply with data retention requirements

Configuration Parameters

Flag: --days-to-live=100
Default: 100 days
Description: Number of days to retain exchange data.
Panel exchanges older than this are eligible for deletion.
Flag: --dry-run=false
Default: false
Description: When enabled, logs which exchanges would be deleted without actually deleting them.

Exchange Data

Panel exchanges involve:
  • Exchange workflow definitions
  • Exchange steps and their execution history
  • Exchange step attempts and retry information
  • Intermediate computation results
  • Metadata and checkpoints

Operation

  1. Query: Identifies exchanges older than the retention period
  2. Cascade: Deletes related exchange steps, attempts, and metadata
  3. Clean: Removes associated blob storage (if applicable)
  4. Log: Records deletion operations

Example Configuration

schedule: "40 6 * * *"  # Daily at 6:40 AM
args:
  - --internal-api-target=gcp-kingdom-data-server:8443
  - --internal-api-cert-host=localhost
  - --tls-cert-file=/var/run/secrets/files/kingdom_tls.pem
  - --tls-key-file=/var/run/secrets/files/kingdom_tls.key
  - --cert-collection-file=/var/run/secrets/files/all_root_certs.pem
  - --days-to-live=100
  - --dry-run=false
The daily schedule (rather than hourly) reflects that panel exchanges are longer-running workflows that don’t require frequent cleanup.

Common Configuration

All Kingdom daemons share common configuration patterns:

Authentication

--internal-api-target=gcp-kingdom-data-server:8443
--internal-api-cert-host=localhost
--tls-cert-file=/var/run/secrets/files/kingdom_tls.pem
--tls-key-file=/var/run/secrets/files/kingdom_tls.key
--cert-collection-file=/var/run/secrets/files/all_root_certs.pem
All jobs authenticate to the Kingdom Data Server using mutual TLS.

Logging

--debug-verbose-grpc-client-logging=[true|false]
Enables detailed gRPC logging for debugging.

Kubernetes Configuration

All CronJobs are configured with:
  • Secrets: Access to kingdom TLS certificates and keys
  • Network Policies: Restricted to communicate only with Data Server
  • Resource Limits: CPU and memory constraints
  • Concurrency Policy: Typically Forbid to prevent overlapping runs
  • Success/Failure History: Limited retention of job history

Deployment Pattern

Kingdom daemons follow a consistent deployment pattern defined in kingdom.cue:
cronJobs: [Name=_]: #CronJob & {
  _name:       strings.TrimSuffix(Name, "-cronjob")
  _secretName: _kingdom_secret_name
  _system:     "kingdom"
  _container: {
    image: _images[_name]
  }
}
This ensures:
  • Consistent naming conventions
  • Shared secret management
  • Unified image versioning
  • Standard container configuration

Monitoring and Alerting

Job Success Rate

Monitor CronJob completion status and failure rates

Deletion Metrics

Track number of records deleted per run

Execution Duration

Alert on jobs that take unusually long to complete

Dry Run Testing

Use dry-run mode to validate before enabling deletions

Best Practices

Retention Policy Design

Set TTL values based on:
  • Legal/compliance retention requirements
  • Storage budget constraints
  • Query performance needs
  • Historical analysis requirements
Always test retention policies with --dry-run=true before enabling deletions:
# Review what would be deleted
kubectl logs -l app=completed-measurements-deletion-app
Track database storage metrics before and after deletion jobs to validate effectiveness.

Troubleshooting

Job Not Running:
# Check CronJob status
kubectl get cronjobs

# View recent job executions
kubectl get jobs --sort-by=.status.startTime

# Check job logs
kubectl logs job/completed-measurements-deletion-xxxxx
Excessive Deletions:
  • Enable dry-run mode immediately
  • Review TTL configuration
  • Check for clock skew or incorrect timestamps
  • Restore from backups if necessary
Insufficient Deletions:
  • Verify TTL is configured correctly
  • Check that jobs are running on schedule
  • Verify network connectivity to Data Server
  • Review job logs for errors

Network Policies

Each daemon has a corresponding network policy that:
  • Allows ingress: None (jobs initiate outbound connections only)
  • Allows egress to: gcp-kingdom-data-server
  • Denies all other traffic
This ensures daemons can only communicate with the Data Server and cannot be accessed externally.

Next Steps

Kingdom Overview

Return to Kingdom architecture overview

Duchy Daemons

Learn about Duchy background jobs

Build docs developers (and LLMs) love