Skip to main content

Overview

Hatch is a Firecracker-based microVM platform designed for agentic workloads. It provides a REST API for lifecycle management, wake-on-request capabilities, snapshot/restore for idle VMs, and subdomain-based reverse proxy routing.
Hatch Architecture Diagram

Core Components

Hatch’s architecture consists of several integrated components working together to provide serverless VM capabilities:

hatchd

Main daemon process that orchestrates all VM operations and exposes the REST API on port 8080

VM Manager

Manages VM lifecycle (create, stop, delete, snapshot, restore) and owns networking singletons

Proxy Server

Subdomain-based reverse proxy on port 9090 that routes HTTP requests to VMs and handles wake-on-request

SSH Gateway

TCP proxy that forwards SSH connections to VMs and can wake snapshotted VMs on connection attempts

PostgreSQL

Stores VM metadata, proxy routes, snapshot records, and image catalog

S3 Storage

Stores snapshot artifacts (memory dumps, CPU state, disk deltas) for VM pause/resume

Request Flow

HTTP Request Flow

1

Request arrives at proxy

Client sends HTTP request to my-agent.hatch.local on port 9090
2

Subdomain extraction

Proxy extracts subdomain (my-agent) from Host header and looks up route in database
3

VM state check

Proxy checks VM state:
  • Running: proceed to proxy
  • Snapshotted: wake VM if auto_wake: true, then proxy
  • Other states: return error
4

Wake if needed

If VM is snapshotted:
  • Acquire per-VM mutex to serialize concurrent wake requests
  • Download snapshot from S3 (memory, vmstate, disk delta)
  • Reconstruct rootfs from base image + delta
  • Re-establish networking (TAP device, DHCP, iptables)
  • Load snapshot into new Firecracker process
5

Reverse proxy

Forward request to vm_guest_ip:target_port inside the VM
6

Record access

Update last access timestamp for idle detection

SSH Connection Flow

1

Connection arrives

SSH client connects to host port (e.g., 16000)
2

SSH Gateway lookup

Gateway identifies VM by SSH port from database
3

Wake if snapshotted

If VM is snapshotted, restore it before forwarding connection (client sees slow handshake)
4

TCP tunnel

Bidirectional TCP pipe between client and guest_ip:22

Data Persistence Layer

PostgreSQL Schema

The database stores:
  • VMs table: VM metadata (ID, state, image_id, vcpu_count, mem_mib, guest_ip, guest_mac, ssh_port, work_dir)
  • Images table: Base image catalog (kernel path, rootfs path, boot args)
  • Routes table: Proxy route mappings (subdomain → vm_id, target_port, auto_wake flag)
  • Snapshots table: Snapshot records (S3 keys for memory/vmstate/disk, VM config JSON, size, timestamp)

S3 Storage Structure

Snapshots are stored in S3-compatible storage with the following structure:
snapshots/
  {vm_id}/
    {snapshot_id}/
      vmstate           # CPU registers and device state (uncompressed)
      memory.gz         # Full memory dump (gzip compressed)
      disk.delta.gz     # Block-level diff vs base image (gzip compressed)

Local Filesystem

Each VM gets a work directory:
{DATA_DIR}/vms/{vm_id}/
  rootfs.ext4              # Per-VM writable rootfs (copy of base image)
  firecracker.socket       # Firecracker API socket
  firecracker.{log,metrics,stdout,stderr}
  cidata-src/              # Temporary cloud-init seed files
  snapshots/{snap_id}/     # Local snapshot staging (deleted after S3 upload)

Component Interactions

The VM Manager is the central orchestrator. All components interact with it:
  • Proxy calls vmm.Restore() for wake-on-HTTP
  • SSH Gateway calls vmm.Restore() for wake-on-SSH
  • Idle Monitor calls vmm.Snapshot() for auto-pause
  • API handlers call vmm.CreateAndStart(), vmm.Stop(), vmm.Delete()

Concurrency Control

Hatch uses several synchronization mechanisms:
  • Per-VM wake mutex (sync.Map in proxy and SSH gateway): Prevents concurrent restore operations for the same VM
  • IP allocator mutex: Serializes IP allocation from the bridge subnet pool
  • DHCP server mutex: Protects dnsmasq hosts file writes and reload signals
  • SSH port map mutex: Guards the in-memory SSH port allocation table

Startup Reconciliation

On startup, the Manager reconciles orphaned resources from previous runs:
1

Mark stale VMs

Database: VMs left in “running” or “starting” state → marked as “error”
2

Flush iptables rules

Remove all SSH forwarding rules in the configured port range
3

Delete TAP devices

Remove all fctap-* interfaces (no VMs running after restart)
4

Recreate bridge

Delete and recreate fcbr0 for fresh ARP state
5

Create loop devices

Ensure /dev/loop0-7 nodes exist for rootfs mounting
This ensures Hatch always starts from a clean, known-good state.

Build docs developers (and LLMs) love