Architecture Stack
The platform is organized into distinct layers, each providing specific capabilities:| Layer | Components | Purpose |
|---|---|---|
| Kubernetes | Kind | Local Kubernetes cluster for development |
| CNI | Cilium + Hubble UI | eBPF-based networking and observability |
| Service Mesh | Istio (ambient mode) | L7 traffic management without sidecars |
| GitOps | ArgoCD + ApplicationSet | Declarative, Git-driven deployment |
| Ingress | Traefik | Edge routing with middleware (CORS, auth, rate-limit) |
| Observability | Prometheus, Grafana, Loki, Tempo, OTel Collector | Full metrics, logs, and traces |
| Storage | Garage (S3-compatible) | Object storage backend for Loki/Tempo |
| Database | PostgreSQL | Relational database for applications |
| Manifest Generation | Nixidy (Nix + Kustomize) | Type-safe, reproducible manifest generation |
Design Principles
1. Infrastructure as Code
All infrastructure is defined declaratively using:- Nix flakes for reproducible development environments and builds
- Nixidy for type-safe Kubernetes manifest generation
- Git as the single source of truth for all configuration
flake.nix defines all dependencies and build outputs:
2. Observability First
The platform implements the three pillars of observability:- Metrics: Prometheus with remote write receiver for OTel metrics
- Logs: Loki with S3 backend for long-term storage
- Traces: Tempo with exemplars linked to metrics and logs
3. Security by Default
Security is built into every layer:- Network policies: Cilium enforces zero-trust networking
- mTLS: Istio provides automatic mutual TLS between services
- JWT authentication: Istio validates tokens at the waypoint proxy
- Secrets management: SOPS with age encryption for sensitive data
4. Developer Experience
The platform optimizes for fast iteration:- Warm cluster support: Hash-based detection skips redundant rebuilds
- Parallel execution: Independent operations run concurrently
- Multiple profiles: Dev-fast (kindnetd), Full (Cilium), and Complete (Cilium + Istio)
- Hot reload: Watch mode for automatic manifest regeneration
Component Interactions
Data Flow: Request Path
Data Flow: Observability
Data Flow: GitOps
Bootstrap Modes
The platform supports three bootstrap modes optimized for different use cases:Dev-Fast Mode (Default)
- CNI: kindnetd (built-in)
- Nodes: 1 control-plane
- Service Mesh: None
- Time: ~120s cold, instant warm
- Use case: Rapid application development
Cilium Mode
- CNI: Cilium with Hubble UI
- Nodes: 1 control-plane + 1 worker
- Service Mesh: None
- Time: ~200s cold
- Use case: Testing eBPF networking and observability
Full Mode
- CNI: Cilium with Hubble UI
- Nodes: 1 control-plane + 2 workers
- Service Mesh: Istio ambient mode
- Time: ~300s cold
- Use case: Production-like environment with full L7 capabilities
Port Mappings
All services are exposed via NodePort on the control-plane node:| Port | Service | Available In |
|---|---|---|
| 30081 | Traefik HTTP | All modes |
| 30090 | Prometheus | All modes |
| 30093 | Alertmanager | All modes |
| 30300 | Grafana (admin/admin) | All modes |
| 31235 | Hubble UI | Cilium, Full |
| 30080 | ArgoCD HTTP | Full (with ArgoCD) |
| 30443 | ArgoCD HTTPS | Full (with ArgoCD) |
Technology Choices
Why Cilium?
Cilium provides eBPF-based networking with significant advantages:- Performance: Kernel-level packet processing without iptables overhead
- Visibility: Hubble UI shows real-time network flows and policies
- Security: Identity-based network policies independent of IP addresses
- Compatibility: Co-exists with Istio for combined L3/L4 + L7 capabilities
Why Istio Ambient Mode?
Ambient mode eliminates sidecar proxies while maintaining L7 capabilities:- Resource efficiency: Shared ztunnel DaemonSet vs. per-pod sidecars
- Operational simplicity: No init containers or pod mutation
- Selective L7: Waypoint proxies only where needed
- mTLS everywhere: Automatic encryption at L4 via ztunnel
Why Nixidy?
Nixidy combines Nix’s type safety with Kubernetes flexibility:- Type checking: Catch errors before applying to cluster
- Reusability: Share modules across environments (local, staging, prod)
- Helm integration: Use Helm charts with Nix’s reproducibility
- Version pinning: Exact chart versions in flake.lock
Why Custom OTel Collector?
The platform builds a custom OpenTelemetry Collector viaflake.nix:
- Minimal size: Only includes required components
- Reproducibility: Exact dependencies pinned in flake.lock
- Security: No unnecessary receivers or exporters
- Performance: Optimized build with only needed processors
Next Steps
Kubernetes Setup
Learn about the Kind cluster configuration and node topology
Networking
Explore Cilium CNI and eBPF-based networking
Service Mesh
Understand Istio ambient mode architecture
Observability
Dive into the metrics, logs, and traces stack
GitOps
Discover ArgoCD and Nixidy manifest generation