Overview
The bootstrap benchmark framework provides data-driven performance analysis of cluster setup scripts. It measures step-by-step execution times, collects resource usage statistics, and generates comprehensive reports across all CPU architectures.Design Philosophy
The benchmarking system follows a measurement-first approach:- Phase 1 (Current): Measure and collect data
- Phase 2 (Future): Optimize based on measurements
Architecture
The benchmark system consists of three core libraries:1. Platform Detection (scripts/lib/platform.sh)
Provides unified platform detection across all architectures:
Exported Variables:
PLATFORM_OS- Operating system (darwin/linux)PLATFORM_ARCH- CPU architecture (aarch64/x86_64)PLATFORM_NIX_SYSTEM- Nix system string (e.g., aarch64-darwin)PLATFORM_LINUX_SYSTEM- Linux system string (for cross-compilation)PLATFORM_IS_WSL- Boolean for WSL2 detectionPLATFORM_DOCKER_ARCH- Docker platform string (linux/arm64 or linux/amd64)
| Environment | PLATFORM_OS | PLATFORM_ARCH | PLATFORM_NIX_SYSTEM | PLATFORM_IS_WSL |
|---|---|---|---|---|
| Apple Silicon Mac | darwin | aarch64 | aarch64-darwin | false |
| Intel Mac | darwin | x86_64 | x86_64-darwin | false |
| Linux x86_64 | linux | x86_64 | x86_64-linux | false |
| Linux aarch64 | linux | aarch64 | aarch64-linux | false |
| WSL2 (x86_64) | linux | x86_64 | x86_64-linux | true |
2. Timing Framework (scripts/lib/timing.sh)
Tracks per-step execution times with millisecond precision:
API:
- Step name
- Duration (seconds with millisecond precision)
- Exit code
- Start/end timestamps (ISO 8601)
- Resource usage (if monitor.sh is available)
3. Resource Monitoring (scripts/lib/monitor.sh)
Collects system resource usage during step execution:
Monitored Metrics:
| Metric | macOS | Linux / WSL2 |
|---|---|---|
| CPU Usage | top -l 1 -s 0 | /proc/stat |
| Memory Usage | vm_stat | /proc/meminfo |
| Docker Stats | docker stats --no-stream | Same |
| Disk I/O | iostat (if available) | /proc/diskstats |
- CPU: Average and peak usage
- Memory: Start, end, and peak values
- Docker: Running container count
Benchmark Command
Usage
MODE- Bootstrap mode:bootstraporfull-bootstrap(default: bootstrap)RUNS- Number of benchmark runs (default: 3)
--keep-logs- Keep previous benchmark logs instead of cleaning them--help- Show help message
What It Measures
The benchmark measures two bootstrap modes:Bootstrap Mode (Fast Development)
- Kind cluster creation
- Kindnetd CNI (faster than Cilium)
- Single-node cluster
- Warm Nix cache
- Minimal observability stack
Full Bootstrap Mode (Production Parity)
- Kind cluster creation
- Cilium CNI + Hubble
- Multi-node cluster capability
- Full observability stack (Prometheus, Grafana, Loki, Tempo)
- ArgoCD bootstrap
- All applications deployed
Workflow
When you runbenchmark, it:
-
Pre-flight Checks
- Verifies docker, kind, kubectl are installed
- Checks Docker daemon is running
- Displays platform summary
-
Clean Previous Logs (unless
--keep-logs)- Removes
logs/benchmark/run_*.json - Removes monitor temporary files
- Removes
-
Run Benchmark Loop
- For each iteration:
- Tear down existing cluster (
cluster-down.sh) - Set
BENCHMARK_MODE=1andBENCHMARK_RUN_NUMBER - Execute bootstrap script with timing instrumentation
- Save results to
logs/benchmark/run_N.json
- Tear down existing cluster (
- For each iteration:
-
Aggregate Results
- Calculate statistics (mean, median, min, max, stddev)
- Generate summary table
- Save to
logs/benchmark/summary_TIMESTAMP.json
Example Output
JSON Output Format
Per-Run Files (logs/benchmark/run_N.json)
Summary File (logs/benchmark/summary_TIMESTAMP.json)
Integration with Bootstrap Scripts
Bootstrap scripts are instrumented with timing framework (example from scripts/bootstrap.sh):Benefits During Normal Use
Even when not benchmarking, the timing framework provides:- Step Progress: Clear indication of which step is running
- Duration Feedback: See how long each step took
- Early Failure Detection: Failed steps are immediately reported with exit codes
- Performance Awareness: Developers can spot slow steps during normal development
Statistics Aggregation
Thescripts/lib/aggregate-stats.py Python script processes benchmark runs:
Calculated Statistics:
- Mean (Average): Sum of values / count
- Median: Middle value when sorted
- Minimum: Fastest run
- Maximum: Slowest run
- Standard Deviation: Measure of variance
- Console table: Human-readable summary
- JSON file: Machine-readable for analysis
Use Cases
1. Identify Bottlenecks
Find the slowest steps in your bootstrap process:2. Compare Architectures
Benchmark on different platforms:3. Measure Optimization Impact
Before and after optimization:4. CI/CD Performance Tracking
Track performance regressions:Performance Tips
Pre-warming Caches
For consistent measurements:Reducing Variance
- Close background applications: Minimize CPU/memory competition
- Use consistent Docker resources: Set fixed CPU/memory limits
- Run multiple iterations: 5-10 runs for stable statistics
- Avoid network-dependent steps: Use local mirrors when possible
Troubleshooting
High Standard Deviation
If stddev is >10% of mean:- Increase number of runs
- Check for background processes
- Verify network stability
- Look for non-deterministic steps
Missing Resource Data
If resource metrics are missing:- Ensure monitor.sh is sourced
- Check platform support (macOS vs Linux)
- Verify required tools are installed (top, docker stats)
Benchmark Hangs
If benchmark doesn’t complete:- Check kubectl wait timeouts
- Verify cluster has sufficient resources
- Look for pod crash loops
- Use
debug-k8scommand to investigate
Future Optimizations (Phase 2)
Based on benchmark data, Phase 2 will consider:- Parallelization: Run independent steps concurrently
- Nix Build Cache: Optimize nix store usage
- Image Pre-building: Cache OTel Collector images in R2
- Helm Repo Caching: Skip
helm repo updatewhen possible - Batch kubectl apply: Reduce API server round trips
- Dynamic Node Scaling: Adjust kind cluster size based on workload
Best Practices
- Always run multiple iterations: Single runs are not reliable
- Document your environment: Include host specs in reports
- Use consistent conditions: Same Docker settings, same Nix cache state
- Track over time: Compare against historical data
- Focus on median: Less affected by outliers than mean
- Investigate high variance: Stddev >10% indicates instability
Next Steps
- Set up development environment
- Learn about nixidy modules
- Create Grafana dashboards