Overview
The OpenSandbox Kubernetes Controller is a Kubernetes operator that manages sandbox environments through custom resources. It offers:- Flexible Sandbox Creation: Pooled and non-pooled modes
- Batch and Individual Delivery: Single sandboxes or high-throughput batches
- Resource Pooling: Pre-warmed resource pools for rapid provisioning
- Optional Task Scheduling: Integrated task orchestration with customizable templates
- Comprehensive Monitoring: Real-time status tracking
Prerequisites
- Kubernetes cluster v1.22.4 or higher
- kubectl v1.11.3 or higher
- Go v1.24.0+ (for building from source)
- Docker v17.03+ (for building images)
- Access to a container registry
Optional: Local Kubernetes with kind
For testing purposes, use kind to create a local cluster:Deployment
Build and Push Images
Build both the controller and task-executor images:For kind users: Load images into the cluster:
Install CRDs
Install the Custom Resource Definitions:This creates the
BatchSandbox and Pool custom resources.Deploy the Operator
Deploy the controller manager to your cluster:You may need cluster-admin privileges for this step.
Configure OpenSandbox Server
Generate a Kubernetes configuration file for the server:Edit
~/.sandbox.toml with your cluster settings (see below).Server Configuration
Configure the OpenSandbox server to use the Kubernetes runtime:Configuration Options
| Option | Type | Default | Description |
|---|---|---|---|
kubeconfig_path | string | null | Path to kubeconfig file. Use null for in-cluster configuration |
namespace | string | "opensandbox" | Kubernetes namespace for sandbox workloads |
workload_provider | string | "batchsandbox" | Workload provider type (batchsandbox or agent-sandbox) |
informer_enabled | boolean | true | Enable watch-based cache (beta) |
informer_resync_seconds | integer | 300 | Full list refresh interval (beta) |
informer_watch_timeout_seconds | integer | 60 | Watch restart interval (beta) |
Informer Settings (Beta)
The informer is enabled by default to reduce API calls:- Set
informer_enabled = falseto disable - Tune
resyncandwatch_timeoutfor your cluster’s API limits - Provides watch-based caching for better performance
Creating Sandbox Resources
Basic Non-Pooled Sandbox
Create a simple sandbox without resource pooling:Resource Pool Configuration
Create a pool of pre-warmed resources:Pooled Sandbox
Create sandboxes using a resource pool:Sandbox with Heterogeneous Tasks
Create sandboxes with process-based task execution: Pool with Task Executor Sidecar:Monitoring Resources
View pool and sandbox status:Performance Characteristics
Batch Delivery Performance
BatchSandbox provides O(1) time complexity for batch delivery: Test Environment:- Controller: 12C32G request, 16C64G limit
- BatchSandbox controller concurrency: 32
- Pool: busybox:latest with 0.1C256MB
| Implementation | Time (seconds) |
|---|---|
| SIG Agent-Sandbox (concurrency=1) | 76.35 |
| SIG Agent-Sandbox (concurrency=10) | 23.17 |
| SIG Agent-Sandbox (concurrency=50) | 33.85 |
| BatchSandbox | 0.92 |
Write Operations Comparison
BatchSandbox: Fixed operations regardless of scale- Create BatchSandbox
- Update annotation (batch allocation)
- Update status
- Create N SandboxClaim objects
- Create N Sandbox objects
- Update N Pods
- Update N Sandbox statuses
- Update N SandboxClaim statuses
Relationship with agent-sandbox
BatchSandbox complements kubernetes-sigs/agent-sandbox with:- Batch Sandbox Semantics: Significantly higher throughput for RL training scenarios
- Task Scheduling: Enables customized sandbox delivery (e.g., process injection)
- agent-sandbox: Traditional single-sandbox workflows
- BatchSandbox: High-throughput batch scenarios, RL training, custom task orchestration
Cleanup
Delete batch sandboxes:Troubleshooting
Sandboxes Not Starting
- Check operator logs:
kubectl logs -n opensandbox-system deployment/opensandbox-controller-manager - Verify CRDs are installed:
kubectl get crds | grep sandbox - Check resource quotas and limits
Task Execution Failures
- Ensure
shareProcessNamespace: truein pool template - Verify task-executor has
SYS_PTRACEcapability - Check task-executor container logs
Performance Issues
- Increase controller concurrency in deployment
- Tune informer settings for your cluster
- Review pool buffer settings