Health Checks
Server Health Endpoint
The OpenSandbox server exposes a health check endpoint for monitoring service availability:- Load balancers for health checks
- Monitoring systems for uptime tracking
- Kubernetes liveness/readiness probes
- Orchestration platforms
Execd Health Check
Each sandbox container runs an execd daemon that exposes its own health endpoint on port 44772:System Metrics
Metrics Endpoint
The execd API provides real-time system resource metrics for individual sandboxes:Metrics Fields
| Field | Type | Description |
|---|---|---|
cpu_count | float | Number of CPU cores available |
cpu_used_pct | float | CPU usage percentage (0-100) |
mem_total_mib | float | Total memory in MiB |
mem_used_mib | float | Used memory in MiB |
timestamp | int64 | Unix timestamp in milliseconds |
Real-time Metrics Streaming
For continuous monitoring, use the Server-Sent Events (SSE) endpoint:Sandbox Status Monitoring
Get Sandbox Details
Retrieve the current status of a sandbox:Sandbox Lifecycle States
Kubernetes Monitoring
BatchSandbox Status
For Kubernetes deployments, monitor BatchSandbox resources:- DESIRED: Number of sandboxes requested
- TOTAL: Total sandboxes created
- ALLOCATED: Sandboxes successfully allocated
- READY: Sandboxes ready for use
- EXPIRE: Expiration time
Pool Status
Monitor resource pool availability:Task Status
For BatchSandbox with tasks:Logging Configuration
Server Log Levels
Configure logging in~/.sandbox.toml:
Kubernetes Controller Logging
Console Output (Default)
File Logging with Rotation
| Parameter | Default | Description |
|---|---|---|
--enable-file-log | false | Enable file logging |
--log-file-path | /var/log/sandbox-controller/controller.log | Log file path |
--log-max-size | 100 | Max file size in MB before rotation |
--log-max-backups | 10 | Max number of old log files |
--log-max-age | 30 | Max days to retain old logs |
--log-compress | true | Compress rotated logs (gzip) |
Production Configuration
Viewing Logs
Integration with Monitoring Systems
Prometheus Metrics
You can expose sandbox metrics to Prometheus by:- Polling the
/metricsendpoint periodically - Converting JSON metrics to Prometheus format
- Using a metrics exporter sidecar
Kubernetes Events
Monitor Kubernetes events for sandbox lifecycle changes:Custom Monitoring
Example Python script for monitoring:Best Practices
Set up health check endpoints
Set up health check endpoints
Configure health checks for both the server and individual sandboxes to enable:
- Automatic restart of failed containers
- Load balancer traffic routing
- Alert generation on service degradation
Monitor resource usage trends
Monitor resource usage trends
Track CPU and memory usage over time to:
- Identify resource-intensive workloads
- Optimize resource limits
- Predict capacity needs
- Detect memory leaks
Configure log rotation
Configure log rotation
Always enable log rotation in production to:
- Prevent disk space exhaustion
- Maintain historical logs for debugging
- Compress old logs to save space
- Comply with retention policies
Use structured logging
Use structured logging
Enable JSON logging format for:
- Easy parsing by log aggregation tools
- Better searchability
- Integration with monitoring platforms
- Automated alerting