Overview
Hatch snapshots capture the complete VM state — CPU registers, memory contents, and disk changes — and upload them to S3-compatible storage. The VM can later be restored to exactly where it was paused, with the same IP, MAC address, and SSH port.Snapshots enable the core serverless pattern: freeze idle VMs to zero compute cost, wake them transparently on the next request.
Snapshot Artifacts
A snapshot consists of three components:vmstate
CPU registers, device state, Firecracker internal state (uncompressed, ~100 KB)
memory.gz
Full memory dump of the guest (gzip compressed, typically 10-50% of configured RAM)
disk.delta.gz
Block-level diff between current rootfs and base image (gzip compressed, often < 100 MB)
Snapshot Creation Flow
Pre-flight checks
- Verify S3 is configured
- Verify VM is in
runningstate - Look up base image for disk delta computation
Create Firecracker snapshot
Call Firecracker API
CreateSnapshot() to dump memory and vmstate to local filesCompute disk delta
Use This captures only the blocks modified by the VM, not the entire disk.
rsync --only-write-batch to generate a binary patch from base image → current rootfsPersist snapshot record
Save metadata to database with VM config JSON (for restore) and artifact S3 keys
View full snapshot code
View full snapshot code
See internal/vmm/snapshot.go:33-156 for complete implementation.
Snapshot Record Schema
The snapshot metadata stored in PostgreSQL:vm_config JSON includes:
Restore Process
Pre-flight checks
- Verify S3 is configured
- Verify VM is in
snapshottedstate - Look up latest snapshot from database
- Parse VM config JSON from snapshot record
Re-establish networking
Recreate TAP device, DHCP reservation, and SSH forwarding with original IP and MAC
Load snapshot into Firecracker
Start new Firecracker process in snapshot mode with memory and vmstate paths
View full restore code
View full restore code
See internal/vmm/snapshot.go:161-284 for complete implementation.
Disk Delta Algorithm
Hatch usesrsync to compute and apply disk deltas:
Creating Delta (Snapshot)
Applying Delta (Restore)
Using
--reflink=auto with copy-on-write filesystems (btrfs, XFS with reflink) makes the initial copy instant and saves disk space.Snapshot Compression
Hatch compresses memory dumps and disk deltas using gzip before uploading to S3:- Memory: 40-60% reduction (depends on guest workload)
- Disk delta: 70-90% reduction (text files, executables compress well)
Resource Lifecycle
What gets cleaned up on snapshot?
✅ Destroyed:- Firecracker process (killed)
- TAP device (deleted)
- DHCP reservation (removed from dnsmasq)
- SSH forwarding rules (iptables rules deleted)
- Machine handle (removed from in-memory map)
- IP allocation (kept reserved for restore)
- SSH port allocation (kept reserved)
- Work directory (contains rootfs for next restore)
- Database record (state changed to
snapshotted)
What gets recreated on restore?
- Fresh Firecracker process
- New TAP device (same name)
- DHCP reservation (same MAC → IP mapping)
- SSH forwarding rules (same port → IP mapping)
- Machine handle (new entry in map)
Wake-on-Request Integration
Snapshots are central to Hatch’s serverless pattern. See Wake-on-Request for details on:- Automatic snapshot on idle timeout
- Transparent restore on HTTP request
- Transparent restore on SSH connection
- Concurrent wake request serialization
Performance Characteristics
Snapshot Time
2-5 seconds for typical VM (1 GB RAM, 10 GB disk)
- Pause: ~10ms
- Memory dump: ~500ms-1s
- Disk delta: ~500ms-2s
- S3 upload: 1-3s (depends on bandwidth)
Restore Time
3-8 seconds for typical VM
- S3 download: 1-3s
- Disk reconstruction: ~500ms-1s
- Network setup: ~100ms
- Resume execution: ~10ms
- Guest network ready: ~1-2s (cloud-init, DHCP)
For HTTP requests, users experience this as slow first-byte time. For SSH, it appears as a slow handshake. Subsequent requests hit the running VM with normal latency.
Storage Requirements
For a VM with 1 GB RAM and 10 GB rootfs:- vmstate: ~100 KB
- memory.gz: ~400-600 MB (depending on memory usage)
- disk.delta.gz: ~50-500 MB (depending on disk changes)
Troubleshooting
Snapshot fails with 'compute disk delta' error
Snapshot fails with 'compute disk delta' error
- Check rsync is installed:
which rsync - Verify base image path in database matches actual file location
- Ensure sufficient disk space in VM work directory
- Check VM rootfs is not corrupted:
fsck.ext4 -n /path/to/rootfs.ext4
Restore fails with 'apply disk delta' error
Restore fails with 'apply disk delta' error
- Verify S3 download completed (check file size)
- Ensure base image hasn’t been deleted/moved since snapshot
- Check disk space in VM work directory
- Try manually:
rsync --read-batch=/path/to/disk.delta /path/to/test.ext4
Restored VM doesn't get network connectivity
Restored VM doesn't get network connectivity
- Verify TAP device exists:
ip link show fctap-<vmid> - Check DHCP reservation:
cat /data/dhcp/hosts | grep <mac> - Verify iptables SSH forward rule:
iptables -t nat -L PREROUTING -n | grep <ssh_port> - Allow 1-2 seconds for guest cloud-init to configure network
S3 upload/download is slow
S3 upload/download is slow
- Check network bandwidth to S3 endpoint
- Consider using a closer S3 region
- For local development, use MinIO on same host (no network latency)
- Monitor S3 request metrics for throttling