Scaling - Convox

Convox allows you to scale any Service along multiple dimensions to meet your application’s performance and capacity requirements.

Scaling Dimensions

Horizontal Scaling

Process Count: Number of concurrent Processes runningIdeal for handling more requests and providing redundancy

Vertical Scaling

CPU: CPU allocation in units (1000 = 1 full CPU)Memory: RAM allocation in megabytesGPU: Number of GPUs per processIdeal for resource-intensive operations

Initial Scale Configuration

Define default scale settings in your convox.yml:

services:
  web:
    build: .
    port: 3000
    scale:
      count: 2
      cpu: 256
      memory: 512

Static count values are only used on first deployment. Subsequent count changes must be made via the CLI to prevent configuration drift.

Scale Attributes

Attribute	Unit	Description
count	number	Number of processes (1-n)
cpu	CPU units	1000 units = 1 full CPU core
memory	MB	RAM allocation in megabytes
gpu	number	GPUs allocated per process (requires GPU-enabled rack)

Manual Scaling

Manually adjust your service scale using the Convox CLI.

Check Current Scale

View the current scale for all services:

$ convox scale -a myapp
NAME  DESIRED  RUNNING  CPU  MEMORY
web   2        2        256  512
api   1        1        512  1024

Scale Process Count

Change the number of processes for a service:

$ convox scale web --count=5 -a myapp
Scaling web...
2020-01-01T00:00:00Z system/k8s/web Scaled up replica set web-65f45567d to 5
2020-01-01T00:00:00Z system/k8s/web-65f45567d Created pod: web-65f45567d-c7sdw
2020-01-01T00:00:00Z system/k8s/web-65f45567d Created pod: web-65f45567d-m9kpx
2020-01-01T00:00:00Z system/k8s/web-65f45567d Created pod: web-65f45567d-x2rty
2020-01-01T00:00:00Z system/k8s/web-65f45567d-c7sdw Successfully assigned dev-convox/web-65f45567d-c7sdw to node
2020-01-01T00:00:00Z system/k8s/web-65f45567d-c7sdw Container image "registry.dev.convox/convox:web.BABCDEFGHI" already present on machine
2020-01-01T00:00:00Z system/k8s/web-65f45567d-c7sdw Created container main
2020-01-01T00:00:00Z system/k8s/web-65f45567d-c7sdw Started container main
OK

Scaling is gradual and respects health checks, ensuring no downtime during the scaling operation.

Scale CPU and Memory

To change CPU or memory allocation, update your convox.yml and deploy:

Update convox.yml

services:
  web:
    build: .
    port: 3000
    scale:
      cpu: 512    # Changed from 256
      memory: 1024  # Changed from 512

Deploy the changes

$ convox deploy -a myapp

CPU and memory changes require a new deployment. You cannot change these values with the convox scale command.

Autoscaling

Configure autoscaling to automatically adjust process count based on resource utilization.

Basic Autoscaling

Define a scale range and target metrics:

services:
  web:
    build: .
    port: 3000
    scale:
      count: 2-10
      cpu: 256
      memory: 512
      targets:
        cpu: 70
        memory: 80

Autoscaling Configuration

Attribute	Example	Description
count	2-10	Min-Max range for process count
targets.cpu	70	Target CPU utilization percentage
targets.memory	80	Target memory utilization percentage

How Autoscaling Works

Monitor metrics

Kubernetes continuously monitors CPU and memory utilization across all processes.

Calculate average

The average utilization is calculated across all running processes.

Compare to target

If average utilization exceeds the target, scale up. If it’s below, scale down.

Adjust replicas

The number of processes is adjusted to maintain the target utilization.

Scaling Formula

Kubernetes uses this formula to calculate desired replicas:

desiredReplicas = ceil[currentReplicas × (currentMetricValue / desiredMetricValue)]

Example:

Current replicas: 4
Current CPU utilization: 85%
Target CPU utilization: 70%
Calculation: ceil[4 × (85 / 70)] = ceil[4.86] = 5 replicas

Multiple Targets

You can set both CPU and memory targets:

services:
  worker:
    build: .
    scale:
      count: 1-20
      cpu: 512
      memory: 1024
      targets:
        cpu: 60
        memory: 75

The autoscaler will scale based on whichever metric requires more replicas.

GPU Scaling

For machine learning, video processing, and scientific computing workloads, Convox supports GPU allocation.

Prerequisites

Use GPU-capable instances

Your rack must run on GPU-enabled instances:

AWS: p3, p4, g4, g5 families
GCP: N1 with GPU accelerators
Azure: NC, ND, NV series

Enable NVIDIA device plugin

$ convox rack params set nvidia_device_plugin_enable=true -r myRack

Basic GPU Configuration

services:
  ml-trainer:
    build: .
    command: python train.py
    scale:
      count: 1
      cpu: 2000
      memory: 8192
      gpu: 1

GPU with Autoscaling

Combine GPU allocation with autoscaling:

services:
  ml-inference:
    build: .
    command: python serve.py
    scale:
      count: 1-5
      cpu: 2000
      memory: 4096
      gpu: 1
      targets:
        cpu: 80

Each process will have access to one GPU, and the service will scale from 1 to 5 processes based on CPU utilization.

GPU Considerations

Whole Units Only

GPUs are allocated as whole units. You cannot request fractional GPUs.

Node Affinity

Processes requesting GPUs will only be scheduled on nodes with available GPUs.

CUDA Support

Use base images that include the NVIDIA CUDA toolkit (e.g., nvidia/cuda:11.8.0-runtime-ubuntu22.04).

Resource Costs

GPU instances are significantly more expensive than standard instances.

Example GPU Configuration

services:
  model-training:
    build: .
    dockerfile: Dockerfile.gpu
    command: python train_model.py
    scale:
      count: 2
      cpu: 4000
      memory: 16384
      gpu: 2
    environment:
      - CUDA_VISIBLE_DEVICES=0,1

This configuration:

Runs 2 processes
Each process gets 2 GPUs
Total GPU requirement: 4 GPUs

Advanced Autoscaling

Custom Metrics with Datadog

Autoscale based on custom metrics from Datadog.

Prerequisites

Install Datadog

Install Datadog and the Datadog Cluster Agent on your rack. Use datadog-agent-all-features.yaml for the complete setup.

Verify installation

$ kubectl get pods
NAME                                    READY   STATUS    RESTARTS   AGE
datadog-bjw2m                           5/5     Running   0          16m
datadog-cluster-agent-b5fd4b7f5-tmkql   1/1     Running   0          16m

Configure external metrics

Follow Datadog’s documentation to configure the external metrics provider.

Create DatadogMetric

Define a custom metric:

apiVersion: datadoghq.com/v1alpha1
kind: DatadogMetric
metadata:
  name: page-views-metrics
spec:
  query: avg:page.views{*}.as_count()

Apply it:

$ kubectl apply -f page-views-metrics.yaml

Configure Service

Reference the metric in your convox.yml:

services:
  web:
    build: .
    port: 3000
    environment:
      - DD_AGENT_HOST=dd-agent.default.svc.cluster.local
    scale:
      count: 1-10
      targets:
        external:
          - name: "datadogmetric@default:page-views-metrics"
            averageValue: 100

The format for external metrics is:

datadogmetric@<namespace>:<metric-name>

Deploy Datadog Service

Create a service for your app to send metrics:

apiVersion: v1
kind: Service
metadata:
  name: dd-agent
spec:
  selector:
    app: datadog
  ports:
    - protocol: UDP
      port: 8125
      targetPort: 8125

Apply it:

$ kubectl apply -f dd-agent-service.yaml

Test Autoscaling

Generate load to trigger scaling:

$ while true; do curl https://your-app-url/; sleep 0.2; done

Monitor process count:

$ convox ps -a myapp
ID                    SERVICE  STATUS   RELEASE      STARTED         COMMAND
web-5675cccf75-chmcc  web      running  RAZUSIKBQGX  25 seconds ago
web-5675cccf75-dm6kb  web      running  RAZUSIKBQGX  6 minutes ago
web-5675cccf75-xk9tz  web      running  RAZUSIKBQGX  10 seconds ago

Scaling Strategies

High Availability

For mission-critical services, maintain minimum redundancy:

services:
  api:
    build: .
    port: 8080
    scale:
      count: 3-20
      cpu: 512
      memory: 1024
      targets:
        cpu: 60
        memory: 70

Benefits:

Always at least 3 instances running
Survives multiple instance failures
Rolling updates maintain capacity

Cost Optimization

For development or low-traffic environments:

services:
  web:
    build: .
    port: 3000
    scale:
      count: 1-3
      cpu: 256
      memory: 512
      targets:
        cpu: 80
        memory: 85

Benefits:

Minimal resource usage when idle
Aggressive scaling targets reduce costs
Suitable for non-critical workloads

Burst Capacity

For services with unpredictable traffic spikes:

services:
  web:
    build: .
    port: 3000
    scale:
      count: 2-50
      cpu: 512
      memory: 1024
      targets:
        cpu: 50
        memory: 60

Benefits:

Low baseline for normal traffic
Can scale rapidly for traffic spikes
Conservative targets trigger scaling early

Best Practices

Start Conservative

Begin with lower resource allocations and scale up based on actual usage patterns.

Monitor Performance

Use monitoring tools to understand your application’s resource utilization over time.

Test Scaling

Load test your application to verify autoscaling behavior before production.

Set Appropriate Ranges

Define min/max count ranges that prevent both resource starvation and excessive costs.

Use Multiple Services

Separate different workload types into different services with independent scaling.

Plan for Failure

Always run at least 2 processes for critical services to ensure availability.

Consider Costs

Balance performance needs with infrastructure costs, especially for GPU instances.

Document Decisions

Record why you chose specific scale settings for future reference.

Troubleshooting

Processes Not Scaling

Symptoms: Process count doesn’t change despite high load Possible causes:

Autoscaling not configured
Insufficient rack resources
Metrics not being collected

Solutions:

Verify autoscaling config in convox.yml
Check rack capacity: convox rack
Review Kubernetes HPA status: kubectl get hpa -n convox-myapp

Autoscaling Too Aggressive

Symptoms: Constantly scaling up and down Possible causes:

Targets set too low
Application has bursty workload patterns
Not enough processes at minimum

Solutions:

Increase target percentages (e.g., 70 → 80)
Increase minimum process count
Smooth out workload with queues

Out of Memory Errors

Symptoms: Processes crash with OOM errors Possible causes:

Memory allocation too low
Memory leaks in application
Unexpected traffic spikes

Solutions:

Increase memory allocation in convox.yml
Profile application for memory leaks
Implement better autoscaling targets
Add memory-based autoscaling

GPU Not Available

Symptoms: Processes requesting GPU stuck in pending Possible causes:

No GPU nodes available
NVIDIA device plugin not installed
All GPUs already allocated

Solutions:

Verify GPU instances in rack
Check NVIDIA plugin: kubectl get pods -n kube-system | grep nvidia
Review node resources: kubectl describe nodes
Scale down other GPU processes

Next Steps

Health Checks

Configure health monitoring for scaled services

Rolling Updates

Understand how scaling affects deployments

Monitoring

Set up comprehensive monitoring and alerting

Load Balancing

Learn how traffic is distributed across processes

Getting Started

Installation

Cloud Providers

Configuration

Deployment

Management

​Scaling Dimensions

Horizontal Scaling

Vertical Scaling

​Initial Scale Configuration

​Scale Attributes

​Manual Scaling

​Check Current Scale

​Scale Process Count

​Scale CPU and Memory

​Autoscaling

​Basic Autoscaling

​Autoscaling Configuration

​How Autoscaling Works

​Scaling Formula

​Multiple Targets

​GPU Scaling

​Prerequisites

​Basic GPU Configuration

​GPU with Autoscaling

​GPU Considerations

Whole Units Only

Node Affinity

CUDA Support

Resource Costs

​Example GPU Configuration

​Advanced Autoscaling

​Custom Metrics with Datadog

​Prerequisites

​Create DatadogMetric

​Configure Service

​Deploy Datadog Service

​Test Autoscaling

​Scaling Strategies

​High Availability

​Cost Optimization

​Burst Capacity

​Best Practices

Start Conservative

Monitor Performance

Test Scaling

Set Appropriate Ranges

Use Multiple Services

Plan for Failure

Consider Costs

Document Decisions

​Troubleshooting

​Processes Not Scaling

​Autoscaling Too Aggressive

​Out of Memory Errors

​GPU Not Available

​Next Steps

Health Checks

Rolling Updates

Monitoring

Load Balancing

Build docs developers (and LLMs) love

Scaling Dimensions

Initial Scale Configuration

Scale Attributes

Manual Scaling

Check Current Scale

Scale Process Count

Scale CPU and Memory

Autoscaling

Basic Autoscaling

Autoscaling Configuration

How Autoscaling Works

Scaling Formula

Multiple Targets

GPU Scaling

Prerequisites

Basic GPU Configuration

GPU with Autoscaling

GPU Considerations

Example GPU Configuration

Advanced Autoscaling

Custom Metrics with Datadog

Prerequisites

Create DatadogMetric

Configure Service

Deploy Datadog Service

Test Autoscaling

Scaling Strategies

High Availability

Cost Optimization

Burst Capacity

Best Practices

Troubleshooting

Processes Not Scaling

Autoscaling Too Aggressive

Out of Memory Errors

GPU Not Available

Next Steps