Horizontal Scaling

BioAgents workers can scale horizontally across multiple servers with no coordination required. All workers connect to the same Redis queue and automatically share the workload.

Scaling Architecture

Multi-Server Deployment

Server Roles

API Servers

Handle HTTP/WebSocket requests, enqueue jobs, broadcast notifications

Worker Servers

Process jobs from queue, execute AI workflows, update database

Redis

Central message broker for job queue and pub/sub

Setup Strategy

Deploy Redis - Use managed service (Upstash, ElastiCache) for high availability
Deploy API servers - Scale based on HTTP traffic and WebSocket connections
Deploy workers - Scale based on queue depth and job processing needs

Worker Deployment

Prerequisites

Each worker server needs:

Docker 20.10+
Access to Redis (via REDIS_URL)
Access to Supabase database
LLM API keys (OpenAI, Anthropic, etc.)

Deploy to New Server

Install Docker

curl -fsSL https://get.docker.com | sh

Clone Repository

git clone https://github.com/bio-xyz/bioagents-agentkit.git
cd bioagents-agentkit

Configure Environment

cp .env.worker.example .env
nano .env

Required variables:

# External Redis (shared across all workers)
REDIS_URL=rediss://default:[email protected]:6379

# Database
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_ANON_KEY=eyJ...

# LLM API Keys
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_API_KEY=AIza...

Start Workers

# Start 2 worker containers
docker-compose -f docker-compose.worker.yml up -d --scale worker=2

# Verify workers are running
docker-compose -f docker-compose.worker.yml ps

Monitor Logs

docker-compose -f docker-compose.worker.yml logs -f

Look for:

redis_publisher_connected
chat_queue_initialized
deep_research_queue_initialized

Worker Configuration

services:
  worker:
    build: .
    command: ["bun", "run", "src/worker.ts"]
    
    environment:
      # Enable queue mode
      - USE_JOB_QUEUE=true
      
      # External Redis
      - REDIS_URL=${REDIS_URL}
      
      # Database
      - SUPABASE_URL=${SUPABASE_URL}
      - SUPABASE_ANON_KEY=${SUPABASE_ANON_KEY}
      
      # LLM API Keys
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
      - GOOGLE_API_KEY=${GOOGLE_API_KEY}
      
      # Worker concurrency
      - CHAT_QUEUE_CONCURRENCY=${CHAT_QUEUE_CONCURRENCY:-5}
      - DEEP_RESEARCH_QUEUE_CONCURRENCY=${DEEP_RESEARCH_QUEUE_CONCURRENCY:-3}
      
      # Production
      - NODE_ENV=production
    
    restart: unless-stopped
    
    # Allow long-running jobs to complete
    stop_grace_period: 8h
    
    # Resource limits
    deploy:
      resources:
        limits:
          memory: 2G
        reservations:
          memory: 512M

Scaling Strategies

Scale Based on Queue Depth

Monitor queue depth and scale workers accordingly:

# Check waiting jobs
redis-cli -u $REDIS_URL LLEN bull:deep-research:waiting
redis-cli -u $REDIS_URL LLEN bull:chat:waiting

Scaling guidelines:

Queue Depth	Recommended Workers	Response Time
0-10 jobs	2 workers	< 5 minutes
10-30 jobs	4 workers	< 10 minutes
30-50 jobs	6 workers	< 15 minutes
50+ jobs	8+ workers	< 20 minutes

Each deep research worker can handle ~3 concurrent jobs. Each chat worker can handle ~5 concurrent jobs.

Auto-Scaling with Monitoring

Implement auto-scaling based on queue metrics:

import redis
import subprocess
import time

REDIS_URL = "redis://your-redis-host:6379"
MIN_WORKERS = 2
MAX_WORKERS = 10
SCALE_UP_THRESHOLD = 20
SCALE_DOWN_THRESHOLD = 5

def get_queue_depth():
    r = redis.from_url(REDIS_URL)
    chat_waiting = r.llen("bull:chat:waiting")
    research_waiting = r.llen("bull:deep-research:waiting")
    return chat_waiting + research_waiting

def get_current_workers():
    result = subprocess.run(
        ["docker-compose", "-f", "docker-compose.worker.yml", "ps", "-q"],
        capture_output=True,
        text=True
    )
    return len(result.stdout.strip().split("\n"))

def scale_workers(count):
    count = max(MIN_WORKERS, min(MAX_WORKERS, count))
    subprocess.run([
        "docker-compose", "-f", "docker-compose.worker.yml",
        "up", "-d", "--scale", f"worker={count}"
    ])
    print(f"Scaled to {count} workers")

while True:
    depth = get_queue_depth()
    current = get_current_workers()
    
    if depth > SCALE_UP_THRESHOLD:
        scale_workers(current + 2)
    elif depth < SCALE_DOWN_THRESHOLD and current > MIN_WORKERS:
        scale_workers(current - 1)
    
    time.sleep(60)  # Check every minute

Concurrency Tuning

Adjust concurrency per worker based on server resources:

Low Memory (1GB)
Medium Memory (2GB)
High Memory (4GB+)

CHAT_QUEUE_CONCURRENCY=2
DEEP_RESEARCH_QUEUE_CONCURRENCY=1

2 chat jobs + 1 research job = ~1.5GB peak memory
Conservative but reliable

CHAT_QUEUE_CONCURRENCY=5
DEEP_RESEARCH_QUEUE_CONCURRENCY=3

Default configuration
Balanced throughput and stability

CHAT_QUEUE_CONCURRENCY=10
DEEP_RESEARCH_QUEUE_CONCURRENCY=5

Maximum throughput
Requires monitoring to prevent OOM

Multi-Region Deployment

Deploy workers in multiple regions for global coverage:

Multi-region deployments require:

Low-latency Redis (use Upstash Global or regional replicas)
Database replication or read replicas
Careful handling of cross-region network latency

Resource Planning

Worker Server Sizing

Hetzner Cloud
DigitalOcean
AWS EC2

Plan	vCPU	RAM	Workers	Cost/mo
CX22	2	4GB	2	$6
CX32	4	8GB	4	$12
CX42	8	16GB	8	$24
CX52	16	32GB	16	$48

Droplet	vCPU	RAM	Workers	Cost/mo
Basic	2	4GB	2	$24
General Purpose	4	8GB	4	$48
CPU-Optimized	8	16GB	8	$96

Instance	vCPU	RAM	Workers	Cost/mo
t3.medium	2	4GB	2	$30
t3.large	2	8GB	4	$60
c6i.xlarge	4	8GB	4	$122
c6i.2xlarge	8	16GB	8	$244

Cost Optimization

Spot Instances

Use spot instances for burst capacity:

# AWS EC2 Spot
aws ec2 run-instances \
  --instance-type c6i.xlarge \
  --spot-instance-request-type one-time \
  --user-data file://worker-init.sh

Benefits:

60-90% cost savings
Good for non-critical workers

Risks:

Can be terminated with 2-minute notice
Workers should handle graceful shutdown

Reserved Instances

Reserve minimum capacity for predictable workloads:

1-year commitment: ~30% savings
3-year commitment: ~50% savings

Strategy:

Reserve minimum worker capacity (e.g., 2 workers)
Use on-demand/spot for scaling above baseline

Autoscaling

Scale workers based on time of day:

# Cron job: Scale up during business hours
0 9 * * 1-5 /opt/bioagents/scale-workers.sh 8

# Scale down at night
0 18 * * 1-5 /opt/bioagents/scale-workers.sh 2

High Availability

Worker Redundancy

Always run at least 2 workers to prevent single point of failure:

# Minimum HA setup
docker-compose -f docker-compose.worker.yml up -d --scale worker=2

If one worker crashes, the other continues processing jobs. BullMQ automatically reassigns stalled jobs.

Graceful Shutdown

Workers use stop_grace_period: 8h to finish long-running jobs:

services:
  worker:
    stop_grace_period: 8h  # Allow deep research jobs to complete

Shutdown behavior:

Docker sends SIGTERM to worker
Worker stops accepting new jobs
Worker continues processing active jobs
After 8 hours, Docker sends SIGKILL (force stop)

Never use docker-compose down without checking for active jobs. Use Bull Board to verify queue is empty first.

Redis Failover

Use managed Redis with automatic failover:

Upstash
AWS ElastiCache
Redis Sentinel

REDIS_URL=rediss://default:[email protected]:6379

Features:

Multi-region replication
Automatic failover
TLS encryption
Pay-per-use pricing

REDIS_URL=redis://master.cluster.abc123.use1.cache.amazonaws.com:6379

Features:

Automatic failover with Redis Cluster
Multi-AZ deployment
Automated backups
CloudWatch monitoring

REDIS_URL=redis://sentinel-1:26379,sentinel-2:26379,sentinel-3:26379

Features:

Self-hosted HA solution
Automatic failover
Lower cost than managed services
Requires more maintenance

Monitoring & Observability

Queue Metrics

Export queue metrics to monitoring systems:

const client = require('prom-client');
const { getChatQueue, getDeepResearchQueue } = require('./queue/queues');

const queueDepthGauge = new client.Gauge({
  name: 'bioagents_queue_depth',
  help: 'Number of jobs waiting in queue',
  labelNames: ['queue', 'state']
});

async function updateMetrics() {
  const chatQueue = getChatQueue();
  const researchQueue = getDeepResearchQueue();
  
  const chatCounts = await chatQueue.getJobCounts();
  const researchCounts = await researchQueue.getJobCounts();
  
  queueDepthGauge.set({ queue: 'chat', state: 'waiting' }, chatCounts.waiting);
  queueDepthGauge.set({ queue: 'chat', state: 'active' }, chatCounts.active);
  queueDepthGauge.set({ queue: 'deep-research', state: 'waiting' }, researchCounts.waiting);
  queueDepthGauge.set({ queue: 'deep-research', state: 'active' }, researchCounts.active);
}

setInterval(updateMetrics, 10000); // Every 10 seconds

Alerting

Set up alerts for queue health:

alerts:
  - name: HighQueueDepth
    expr: bioagents_queue_depth{state="waiting"} > 50
    for: 10m
    annotations:
      summary: "Queue depth is high"
      description: "{{ $labels.queue }} has {{ $value }} waiting jobs"
  
  - name: NoActiveWorkers
    expr: count(up{job="bioagents-worker"}) == 0
    for: 1m
    annotations:
      summary: "No workers are running"
      description: "All workers are down - jobs will not be processed"
  
  - name: HighJobFailureRate
    expr: rate(bioagents_job_failures_total[5m]) > 0.1
    for: 5m
    annotations:
      summary: "Job failure rate is high"
      description: "{{ $value }} jobs/sec are failing"

Troubleshooting

Workers Not Picking Up Jobs

Check Redis connection:

docker-compose -f docker-compose.worker.yml logs | grep -i redis

Expected output:

redis_publisher_connected
chat_queue_initialized
deep_research_queue_initialized

Verify Redis URL:

docker-compose -f docker-compose.worker.yml exec worker env | grep REDIS

Uneven Load Distribution

Symptom: Some workers process many jobs, others idle. Cause: Different worker start times or concurrency settings. Fix: Ensure all workers have identical configuration:

# Restart all workers simultaneously
docker-compose -f docker-compose.worker.yml down
docker-compose -f docker-compose.worker.yml up -d --scale worker=4

Memory Leaks

Monitor memory over time:

docker stats --no-stream

Implement periodic restarts:

# Cron job: Rolling restart every 24 hours
0 3 * * * /opt/bioagents/rolling-restart.sh

rolling-restart.sh

#!/bin/bash
# Restart workers one at a time to maintain capacity

for i in {1..4}; do
  echo "Restarting worker $i..."
  docker-compose -f docker-compose.worker.yml restart worker_$i
  sleep 60  # Wait 1 minute between restarts
done

Best Practices

Next Steps

Job Queue

Learn about BullMQ architecture and configuration

Docker Setup

Deploy with docker-compose

Job Queue Architecture

Custom Agents

⌘I

Auto-generate your docs

Scaling Architecture
Multi-Server Deployment
Server Roles
Setup Strategy
Worker Deployment
Prerequisites
Deploy to New Server
Worker Configuration
Scaling Strategies
Scale Based on Queue Depth
Auto-Scaling with Monitoring
Concurrency Tuning
Multi-Region Deployment
Resource Planning
Worker Server Sizing
Cost Optimization
High Availability
Worker Redundancy
Graceful Shutdown
Redis Failover
Monitoring & Observability
Queue Metrics
Alerting
Troubleshooting
Workers Not Picking Up Jobs
Uneven Load Distribution
Memory Leaks
Best Practices
Next Steps

Build docs developers (and LLMs) love

Get started for free Talk to us

Get Started

Core Concepts

Agents

Configuration

Features

Deployment

Advanced

​Scaling Architecture

​Multi-Server Deployment

​Server Roles

API Servers

Worker Servers

Redis

​Setup Strategy

​Worker Deployment

​Prerequisites

​Deploy to New Server

​Worker Configuration

​Scaling Strategies

​Scale Based on Queue Depth

​Auto-Scaling with Monitoring

​Concurrency Tuning

​Multi-Region Deployment

​Resource Planning

​Worker Server Sizing

​Cost Optimization

​High Availability

​Worker Redundancy

​Graceful Shutdown

​Redis Failover

​Monitoring & Observability

​Queue Metrics

​Alerting

​Troubleshooting

​Workers Not Picking Up Jobs

​Uneven Load Distribution

​Memory Leaks

​Best Practices

​Next Steps

Job Queue

Docker Setup

Build docs developers (and LLMs) love

Scaling Architecture

Multi-Server Deployment

Server Roles

Setup Strategy

Worker Deployment

Prerequisites

Deploy to New Server

Worker Configuration

Scaling Strategies

Scale Based on Queue Depth

Auto-Scaling with Monitoring

Concurrency Tuning

Multi-Region Deployment

Resource Planning

Worker Server Sizing

Cost Optimization

High Availability

Worker Redundancy

Graceful Shutdown

Redis Failover

Monitoring & Observability

Queue Metrics

Alerting

Troubleshooting

Workers Not Picking Up Jobs

Uneven Load Distribution

Memory Leaks

Best Practices

Next Steps