Scale your services horizontally by running multiple container replicas across your cluster. Uncloud automatically distributes containers across machines and load balances traffic between them.
Quick start
Scale a service to 5 replicas:
Uncloud spreads the 5 containers across available machines in your cluster.
Scaling methods
You can scale services in two ways:
Using uc service scale
Scale an existing service:
# Scale to specific number of replicas
uc service scale web 10
# Scale down
uc service scale web 3
# Use service name or ID
uc service scale 9a8b7c6d5e4f3a2b 5
The command:
Inspects the current service state
Plans the changes (adding or removing containers)
Asks for confirmation when scaling down
Executes the deployment with rolling updates
Using Compose files
Set replicas in your compose.yaml:
services :
web :
image : nginx
scale : 5 # Run 5 replicas
Or use the deploy.replicas syntax:
services :
web :
image : nginx
deploy :
replicas : 5
Deploy with:
How scaling works
Horizontal scaling
Uncloud distributes containers across available machines:
1 machine, 5 replicas : All 5 containers run on the single machine
3 machines, 5 replicas : Containers spread evenly (2, 2, 1 distribution)
5 machines, 5 replicas : One container per machine
Round-robin distribution
Uncloud uses a round-robin approach to spread containers:
services :
web :
image : nginx
scale : 6
With 3 machines (machine-1, machine-2, machine-3):
Container 1 → machine-1
Container 2 → machine-2
Container 3 → machine-3
Container 4 → machine-1
Container 5 → machine-2
Container 6 → machine-3
Result: 2 containers per machine.
Scaling up
When you increase replicas:
Plan deployment
Uncloud calculates how many new containers to add and where to place them.
Start new containers
New containers start on selected machines.
Health monitoring
Each new container is monitored for health before continuing.
Add to load balancer
Healthy containers are added to Caddy’s load balancing pool.
Scaling down
When you decrease replicas:
Confirm scale down
Uncloud shows the plan and asks for confirmation (to prevent accidental data loss).
Remove from load balancer
Containers are removed from Caddy configuration.
Stop containers
Containers are gracefully stopped (SIGTERM, then SIGKILL after grace period).
Remove containers
Stopped containers are removed.
Scaling to zero is not supported. Use uc rm <service> to remove a service entirely.
Load balancing
Uncloud automatically load balances traffic across replicas.
HTTP/HTTPS traffic
Caddy reverse proxy distributes requests to all healthy containers:
services :
web :
image : myapp:latest
scale : 5
x-ports :
- app.example.com:8000/https
Caddy configuration (auto-generated):
https://app.example.com {
reverse_proxy 10.210.0.3:8000 10.210.1.4:8000 10.210.2.5:8000 10.210.0.6:8000 10.210.1.7:8000 {
import common_proxy
}
log
}
Features:
Round-robin load balancing
Automatic health checking (passive and active)
Failed requests retry on other upstreams
Unhealthy containers automatically removed
Internal service discovery
Services communicate via DNS names that resolve to all healthy container IPs:
services :
api :
image : api:latest
scale : 3
web :
image : web:latest
environment :
# Resolves to all 3 API container IPs
API_URL : http://api:8000
The api hostname returns all 3 container IPs. Your HTTP client (like curl, axios, fetch) handles load distribution.
Placement constraints
Control which machines can run your service containers.
Using x-machines extension
Restrict service to specific machines:
services :
web :
image : nginx
scale : 6
x-machines :
- us-east-1
- us-east-2
- eu-west-1
The 6 replicas spread evenly across these 3 machines (2 per machine).
Pin to single machine
For stateful services that can’t run across machines:
services :
database :
image : postgres:16
x-machines : db-server
volumes :
- db-data:/var/lib/postgresql/data
volumes :
db-data :
Short syntax for single machine placement.
Geographic distribution
services :
web :
image : myapp:latest
scale : 6
x-machines :
- us-east-1 # 2 replicas
- us-east-2 # 2 replicas
- eu-west-1 # 2 replicas
x-ports :
- app.example.com:8000/https
Users are served by the closest healthy container (via Caddy’s load balancing).
Scaling strategies
Auto-scaling (manual)
Uncloud doesn’t have built-in auto-scaling yet. Scale manually based on metrics:
# Monitor CPU/memory usage
uc ps
# Scale up when usage is high
uc service scale web 10
# Scale down when usage is low
uc service scale web 5
Scheduled scaling
Use cron jobs for time-based scaling:
# Scale up during business hours (9 AM)
0 9 * * * uc service scale web 10
# Scale down at night (6 PM)
0 18 * * * uc service scale web 3
Blue-green deployments
Deploy new version alongside old version:
services :
# Old version (blue)
web-blue :
image : myapp:v1
scale : 5
x-ports :
- app.example.com:8000/https
# New version (green)
web-green :
image : myapp:v2
scale : 5
x-ports :
- app-staging.example.com:8000/https
Test the green deployment, then switch traffic:
services :
web-green :
image : myapp:v2
scale : 5
x-ports :
- app.example.com:8000/https # Switch to production
web-blue :
image : myapp:v1
scale : 0 # Scale down old version
Scaling with volumes
Shared volumes (read-only)
Multiple replicas can safely read from the same volume:
services :
web :
image : nginx
scale : 5
volumes :
# Read-only shared config
- /etc/nginx/conf:/etc/nginx/conf:ro
Shared volumes (read-write)
Be careful with concurrent writes:
services :
app :
image : myapp:latest
scale : 3
volumes :
# Shared uploads directory
- uploads:/app/uploads
volumes :
uploads :
Ensure your application handles concurrent file access safely.
Per-replica volumes
For data that shouldn’t be shared, use external volumes per machine:
services :
worker :
image : worker:latest
scale : 3
x-machines :
- worker-1
- worker-2
- worker-3
volumes :
# Each machine has its own cache volume
- cache:/tmp/cache
volumes :
cache :
Global services
Run exactly one container on every machine:
services :
monitoring-agent :
image : prometheus/node-exporter
deploy :
mode : global
Global services automatically scale as you add/remove machines:
Add a machine → new container starts
Remove a machine → container is removed
Global service use cases
Monitoring agents : Collect metrics from each machine
Log collectors : Forward logs from each machine
Local caches : Provide caching on every machine
Network tools : DNS resolvers, proxies
Scaling best practices
Begin with fewer replicas and scale up based on actual load: services :
web :
image : myapp:latest
scale : 2 # Start with 2
Monitor performance and increase as needed.
Distribute across machines
Use multiple machines for high availability: services :
web :
image : myapp:latest
scale : 6
x-machines :
- machine-1
- machine-2
- machine-3
If one machine fails, others continue serving traffic.
Run enough replicas to handle machine failures:
3 machines → at least 4-6 replicas
One machine fails → remaining replicas handle the load
Check CPU and memory before scaling: uc ps # Shows resource usage
If containers are underutilized, you might not need more replicas.
Scaling limitations
Cannot scale to zero
Uncloud doesn’t support scaling to 0 replicas:
# This fails
uc service scale web 0
# Error: scaling to zero replicas is not supported
# Use 'uc rm web' instead
Reason: Uncloud derives service configuration from running containers. With zero containers, there’s no configuration to restore when scaling back up.
Global services cannot be scaled
Global services always run one replica per machine:
services :
agent :
image : monitoring-agent
deploy :
mode : global
# Cannot set scale or replicas
Volumes and scaling
Shared volumes with multiple replicas:
Read-only : Safe to scale freely
Read-write : Ensure application handles concurrent access
Database volumes : Don’t scale beyond 1 replica unless using clustering
Real-world examples
Scale web application
services :
web :
build : .
scale : 5
x-ports :
- app.example.com:8000/https
environment :
DATABASE_URL : postgres://db:5432/myapp
REDIS_URL : redis://cache:6379
healthcheck :
test : curl -f http://localhost:8000/health
interval : 10s
db :
image : postgres:16
x-machines : db-server
volumes :
- db-data:/var/lib/postgresql/data
cache :
image : redis:alpine
x-machines : cache-server
volumes :
db-data :
Scale the web tier horizontally, keep database and cache on dedicated servers.
Geographic distribution
services :
api :
image : api:latest
scale : 6
x-machines :
- us-east-1
- us-east-2
- us-west-1
- eu-west-1
- ap-south-1
- ap-southeast-1
x-ports :
- api.example.com:8000/https
healthcheck :
test : curl -f http://localhost:8000/health
One replica per region for low latency worldwide.
Background workers
services :
worker :
image : worker:latest
scale : 10
environment :
QUEUE_URL : redis://queue:6379
CONCURRENCY : 4
queue :
image : redis:alpine
x-machines : queue-server
Scale workers independently of web tier.
Monitoring scaled services
List all containers
Shows all containers across all machines with their status.
Inspect service
Shows:
Service ID and name
Number of replicas
Container locations
Health status
Endpoints
Check logs
# Logs from all replicas
uc logs web
# Follow logs in real-time
uc logs -f web
# Logs from specific container
uc logs < container-i d >
View Caddy upstreams
Shows the generated Caddyfile with all upstream container IPs.
Next steps
Rolling Updates Update scaled services with zero downtime
Health Checks Ensure only healthy containers receive traffic
Docker Compose Define multi-service applications
Deploying Services Learn about service deployment basics