Overview
Daytona provides built-in observability features including OpenTelemetry (OTEL) integration, telemetry tracking, and metrics collection for monitoring sandbox operations and performance.OpenTelemetry Integration
Daytona SDK supports OpenTelemetry for distributed tracing of all sandbox operations.Enable OTEL Tracing
Environment Variable Configuration
Enable OTEL using environment variables:Automatic Span Creation
When OTEL is enabled, the SDK automatically creates spans for:- Sandbox creation and management
- File operations
- Process execution
- Git operations
- Network requests
- Code execution
Using Async Disposal
For Node.js applications, use async disposal to ensure traces are flushed:OTEL Collector Configuration
Daytona uses a custom OpenTelemetry Collector for processing telemetry data.Collector Components
The Daytona OTEL Collector includes:- OTLP Receiver: Accepts traces, metrics, and logs via HTTP
- Daytona Exporter: Routes telemetry to organization-specific endpoints
- ClickHouse Exporter: Stores telemetry data for analysis
Collector Configuration
Custom OTLP Endpoint
Configure a custom OTLP endpoint:Telemetry and Metrics
SDK Telemetry
The SDK automatically tracks:| Metric | Description |
|---|---|
sandbox.create.duration | Time to create sandboxes |
sandbox.start.duration | Time to start sandboxes |
sandbox.stop.duration | Time to stop sandboxes |
sandbox.delete.duration | Time to delete sandboxes |
process.execute.duration | Process execution time |
fs.operation.duration | File system operation time |
git.operation.duration | Git operation time |
http.request.duration | HTTP request duration |
http.response.status_code | HTTP response codes |
Trace Attributes
Spans include attributes such as:Monitoring Sandbox State
Check Sandbox Status
Monitor Backup State
Monitor Lifecycle Configuration
Error Tracking
Monitor Error States
Custom Error Handling
Monitoring Best Practices
- Enable OTEL in production: Get visibility into all SDK operations.
- Set up dashboards: Use tools like Grafana to visualize telemetry data.
- Monitor resource usage: Track CPU, memory, and disk utilization.
- Track lifecycle events: Monitor auto-stop, auto-archive, and auto-delete events.
- Alert on errors: Set up alerts for sandboxes in error state.
- Use labels for filtering: Add labels to sandboxes for easier monitoring and grouping.
- Monitor costs: Track sandbox usage across regions and teams.
Observability Stack
Recommended Tools
| Tool | Purpose |
|---|---|
| Grafana | Visualization and dashboards |
| Prometheus | Metrics collection and storage |
| Jaeger | Distributed tracing visualization |
| ClickHouse | Long-term telemetry storage |
| Loki | Log aggregation |
Example Dashboard Metrics
Health Checks
The OTEL Collector provides health check endpoints:Data Retention
Telemetry data retention in ClickHouse:Related
- Auto Lifecycle - Monitor lifecycle events
- Resource Management - Track resource usage
- Regions - Monitor multi-region deployments