Skip to main content

Decentralized Design

Unlike traditional orchestrators like Kubernetes or Docker Swarm, Uncloud has no central control plane. There are no master nodes, no quorum requirements, and no single point of failure to maintain. Instead, Uncloud uses a peer-to-peer architecture where every machine is equal. Each machine maintains a complete copy of the cluster state and can accept commands from the CLI or web interface. This design favors Availability and Partition tolerance (AP) over strict Consistency in the CAP theorem.
In case of network partitioning, you can continue to interact with each partition separately to manage services running on them. The system automatically reconciles state when the partition heals.

State Synchronization with Corrosion

Uncloud uses Corrosion, a distributed SQLite database built by Fly.io, to share cluster state between machines. Corrosion uses Conflict-Free Replicated Data Types (CRDTs) to enable eventually consistent state synchronization. Here’s how it works:
  • Every machine stores the complete cluster state in its local Corrosion database
  • Machines can modify their copy independently without coordination
  • State changes propagate through the mesh network to other machines
  • CRDTs automatically resolve conflicts from concurrent updates
  • All machines eventually converge to the same state
Corrosion runs as a systemd service (uncloud-corrosion) on each machine alongside the main daemon.
This approach eliminates the need for leader election, quorum management, or complex consensus protocols. The tradeoff is eventual consistency - machines may have slightly different views of the state at any given moment, but they’re guaranteed to converge.

WireGuard Mesh Networking

Uncloud creates a flat WireGuard overlay network between all machines in your cluster. This mesh network allows containers to communicate directly with each other regardless of which machine they’re running on.

Network Topology

The mesh network uses the 10.210.0.0/16 address space:
CIDRDescription
10.210.0.0/16The entire WireGuard mesh network
10.210.X.0/24/24 subnet assigned to machine X
10.210.X.1/32Machine X address (first address from subnet)
10.210.X.Y/32Container Y address running on machine X
Each machine gets a unique /24 subnet for itself and its containers. The machine takes the first IP (.1), and containers get IPs from .2 to .254.

Automatic Peer Discovery

When you add a new machine to the cluster:
  1. A unique WireGuard key pair is generated
  2. A new /24 subnet is allocated from the cluster address space
  3. The machine is registered in the shared cluster state
  4. Existing machines learn about it through state synchronization
  5. All machines automatically establish WireGuard tunnels
Uncloud uses NAT traversal techniques inspired by Talos KubeSpan to establish connections even when machines are behind firewalls.

Keepalive and Connection Maintenance

WireGuard tunnels use a 25-second keepalive interval to maintain connections through firewalls and NAT devices. This interval works reliably with most network infrastructure while minimizing bandwidth overhead.

Core Components

Each machine in an Uncloud cluster runs these components:

uncloudd Daemon

The main daemon (uncloudd) runs on every machine as a systemd service. It handles:
  • Container lifecycle management through Docker
  • WireGuard mesh configuration
  • gRPC API for CLI and machine-to-machine communication
  • Request routing between machines using grpc-proxy
You can check daemon logs with: journalctl -u uncloud -f

Corrosion Database

The distributed state store runs as a separate systemd service (uncloud-corrosion). It provides:
  • CRDT-based SQLite database for cluster state
  • Automatic state replication between machines
  • Eventual consistency guarantees
  • Conflict-free concurrent updates
You can check Corrosion logs with: journalctl -u uncloud-corrosion -f

Caddy Reverse Proxy

Caddy runs as a container in global mode on machines with the appropriate role. It provides:
  • Automatic HTTPS with Let’s Encrypt
  • Reverse proxy for HTTP/HTTPS services
  • Dynamic configuration updates from cluster state
  • Health check integration
See the Ingress documentation for more details.

DNS Server

An embedded DNS server runs inside each uncloudd daemon to provide service discovery. It:
  • Resolves service names to container IPs
  • Forwards external queries to upstream DNS servers
  • Updates records automatically when containers start/stop
  • Provides multiple resolution modes (round-robin, nearest)
See the Services documentation for DNS naming conventions.

Communication Flow

When you run a command like uc ls to list services:
  1. The CLI connects to any machine via SSH
  2. It makes a gRPC call to that machine’s uncloudd daemon
  3. The daemon queries its local Corrosion database for cluster state
  4. Results are returned to the CLI
When you deploy a service:
  1. CLI sends the service spec to the connected machine
  2. The daemon uses grpc-proxy to route container creation to target machines
  3. Each target machine starts containers and updates cluster state
  4. State changes propagate through Corrosion to all machines
  5. DNS servers and Caddy instances update automatically
You can connect to any machine in the cluster - they all have complete state and can execute any operation.

Resource Footprint

Uncloud is designed to be lightweight:
  • uncloudd daemon: ~30-50 MB RAM
  • uncloud-corrosion: ~20-30 MB RAM
  • Caddy container: ~40-60 MB RAM
  • Total overhead: ~150 MB RAM per machine
This minimal footprint leaves maximum resources available for your applications.

Further Reading

Machines

Learn about machine lifecycle and subnet allocation

Services

Understand service modes and container orchestration

Networking

Deep dive into WireGuard mesh and IP addressing

Design Document

Read the original design philosophy and goals

Build docs developers (and LLMs) love