Skip to main content
BioAgents workers can scale horizontally across multiple servers with no coordination required. All workers connect to the same Redis queue and automatically share the workload.

Scaling Architecture

Multi-Server Deployment

Server Roles

API Servers

Handle HTTP/WebSocket requests, enqueue jobs, broadcast notifications

Worker Servers

Process jobs from queue, execute AI workflows, update database

Redis

Central message broker for job queue and pub/sub

Setup Strategy

  1. Deploy Redis - Use managed service (Upstash, ElastiCache) for high availability
  2. Deploy API servers - Scale based on HTTP traffic and WebSocket connections
  3. Deploy workers - Scale based on queue depth and job processing needs

Worker Deployment

Prerequisites

Each worker server needs:
  • Docker 20.10+
  • Access to Redis (via REDIS_URL)
  • Access to Supabase database
  • LLM API keys (OpenAI, Anthropic, etc.)

Deploy to New Server

1

Install Docker

curl -fsSL https://get.docker.com | sh
2

Clone Repository

git clone https://github.com/bio-xyz/bioagents-agentkit.git
cd bioagents-agentkit
3

Configure Environment

cp .env.worker.example .env
nano .env
Required variables:
# External Redis (shared across all workers)
REDIS_URL=rediss://default:[email protected]:6379

# Database
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_ANON_KEY=eyJ...

# LLM API Keys
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_API_KEY=AIza...
4

Start Workers

# Start 2 worker containers
docker-compose -f docker-compose.worker.yml up -d --scale worker=2

# Verify workers are running
docker-compose -f docker-compose.worker.yml ps
5

Monitor Logs

docker-compose -f docker-compose.worker.yml logs -f
Look for:
redis_publisher_connected
chat_queue_initialized
deep_research_queue_initialized

Worker Configuration

services:
  worker:
    build: .
    command: ["bun", "run", "src/worker.ts"]
    
    environment:
      # Enable queue mode
      - USE_JOB_QUEUE=true
      
      # External Redis
      - REDIS_URL=${REDIS_URL}
      
      # Database
      - SUPABASE_URL=${SUPABASE_URL}
      - SUPABASE_ANON_KEY=${SUPABASE_ANON_KEY}
      
      # LLM API Keys
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
      - GOOGLE_API_KEY=${GOOGLE_API_KEY}
      
      # Worker concurrency
      - CHAT_QUEUE_CONCURRENCY=${CHAT_QUEUE_CONCURRENCY:-5}
      - DEEP_RESEARCH_QUEUE_CONCURRENCY=${DEEP_RESEARCH_QUEUE_CONCURRENCY:-3}
      
      # Production
      - NODE_ENV=production
    
    restart: unless-stopped
    
    # Allow long-running jobs to complete
    stop_grace_period: 8h
    
    # Resource limits
    deploy:
      resources:
        limits:
          memory: 2G
        reservations:
          memory: 512M

Scaling Strategies

Scale Based on Queue Depth

Monitor queue depth and scale workers accordingly:
# Check waiting jobs
redis-cli -u $REDIS_URL LLEN bull:deep-research:waiting
redis-cli -u $REDIS_URL LLEN bull:chat:waiting
Scaling guidelines:
Queue DepthRecommended WorkersResponse Time
0-10 jobs2 workers< 5 minutes
10-30 jobs4 workers< 10 minutes
30-50 jobs6 workers< 15 minutes
50+ jobs8+ workers< 20 minutes
Each deep research worker can handle ~3 concurrent jobs. Each chat worker can handle ~5 concurrent jobs.

Auto-Scaling with Monitoring

Implement auto-scaling based on queue metrics:
import redis
import subprocess
import time

REDIS_URL = "redis://your-redis-host:6379"
MIN_WORKERS = 2
MAX_WORKERS = 10
SCALE_UP_THRESHOLD = 20
SCALE_DOWN_THRESHOLD = 5

def get_queue_depth():
    r = redis.from_url(REDIS_URL)
    chat_waiting = r.llen("bull:chat:waiting")
    research_waiting = r.llen("bull:deep-research:waiting")
    return chat_waiting + research_waiting

def get_current_workers():
    result = subprocess.run(
        ["docker-compose", "-f", "docker-compose.worker.yml", "ps", "-q"],
        capture_output=True,
        text=True
    )
    return len(result.stdout.strip().split("\n"))

def scale_workers(count):
    count = max(MIN_WORKERS, min(MAX_WORKERS, count))
    subprocess.run([
        "docker-compose", "-f", "docker-compose.worker.yml",
        "up", "-d", "--scale", f"worker={count}"
    ])
    print(f"Scaled to {count} workers")

while True:
    depth = get_queue_depth()
    current = get_current_workers()
    
    if depth > SCALE_UP_THRESHOLD:
        scale_workers(current + 2)
    elif depth < SCALE_DOWN_THRESHOLD and current > MIN_WORKERS:
        scale_workers(current - 1)
    
    time.sleep(60)  # Check every minute

Concurrency Tuning

Adjust concurrency per worker based on server resources:
CHAT_QUEUE_CONCURRENCY=2
DEEP_RESEARCH_QUEUE_CONCURRENCY=1
  • 2 chat jobs + 1 research job = ~1.5GB peak memory
  • Conservative but reliable

Multi-Region Deployment

Deploy workers in multiple regions for global coverage:
Multi-region deployments require:
  • Low-latency Redis (use Upstash Global or regional replicas)
  • Database replication or read replicas
  • Careful handling of cross-region network latency

Resource Planning

Worker Server Sizing

PlanvCPURAMWorkersCost/mo
CX2224GB2$6
CX3248GB4$12
CX42816GB8$24
CX521632GB16$48

Cost Optimization

Use spot instances for burst capacity:
# AWS EC2 Spot
aws ec2 run-instances \
  --instance-type c6i.xlarge \
  --spot-instance-request-type one-time \
  --user-data file://worker-init.sh
Benefits:
  • 60-90% cost savings
  • Good for non-critical workers
Risks:
  • Can be terminated with 2-minute notice
  • Workers should handle graceful shutdown
Reserve minimum capacity for predictable workloads:
  • 1-year commitment: ~30% savings
  • 3-year commitment: ~50% savings
Strategy:
  • Reserve minimum worker capacity (e.g., 2 workers)
  • Use on-demand/spot for scaling above baseline
Scale workers based on time of day:
# Cron job: Scale up during business hours
0 9 * * 1-5 /opt/bioagents/scale-workers.sh 8

# Scale down at night
0 18 * * 1-5 /opt/bioagents/scale-workers.sh 2

High Availability

Worker Redundancy

Always run at least 2 workers to prevent single point of failure:
# Minimum HA setup
docker-compose -f docker-compose.worker.yml up -d --scale worker=2
If one worker crashes, the other continues processing jobs. BullMQ automatically reassigns stalled jobs.

Graceful Shutdown

Workers use stop_grace_period: 8h to finish long-running jobs:
services:
  worker:
    stop_grace_period: 8h  # Allow deep research jobs to complete
Shutdown behavior:
  1. Docker sends SIGTERM to worker
  2. Worker stops accepting new jobs
  3. Worker continues processing active jobs
  4. After 8 hours, Docker sends SIGKILL (force stop)
Never use docker-compose down without checking for active jobs. Use Bull Board to verify queue is empty first.

Redis Failover

Use managed Redis with automatic failover:
REDIS_URL=rediss://default:[email protected]:6379
Features:
  • Multi-region replication
  • Automatic failover
  • TLS encryption
  • Pay-per-use pricing

Monitoring & Observability

Queue Metrics

Export queue metrics to monitoring systems:
const client = require('prom-client');
const { getChatQueue, getDeepResearchQueue } = require('./queue/queues');

const queueDepthGauge = new client.Gauge({
  name: 'bioagents_queue_depth',
  help: 'Number of jobs waiting in queue',
  labelNames: ['queue', 'state']
});

async function updateMetrics() {
  const chatQueue = getChatQueue();
  const researchQueue = getDeepResearchQueue();
  
  const chatCounts = await chatQueue.getJobCounts();
  const researchCounts = await researchQueue.getJobCounts();
  
  queueDepthGauge.set({ queue: 'chat', state: 'waiting' }, chatCounts.waiting);
  queueDepthGauge.set({ queue: 'chat', state: 'active' }, chatCounts.active);
  queueDepthGauge.set({ queue: 'deep-research', state: 'waiting' }, researchCounts.waiting);
  queueDepthGauge.set({ queue: 'deep-research', state: 'active' }, researchCounts.active);
}

setInterval(updateMetrics, 10000); // Every 10 seconds

Alerting

Set up alerts for queue health:
alerts:
  - name: HighQueueDepth
    expr: bioagents_queue_depth{state="waiting"} > 50
    for: 10m
    annotations:
      summary: "Queue depth is high"
      description: "{{ $labels.queue }} has {{ $value }} waiting jobs"
  
  - name: NoActiveWorkers
    expr: count(up{job="bioagents-worker"}) == 0
    for: 1m
    annotations:
      summary: "No workers are running"
      description: "All workers are down - jobs will not be processed"
  
  - name: HighJobFailureRate
    expr: rate(bioagents_job_failures_total[5m]) > 0.1
    for: 5m
    annotations:
      summary: "Job failure rate is high"
      description: "{{ $value }} jobs/sec are failing"

Troubleshooting

Workers Not Picking Up Jobs

Check Redis connection:
docker-compose -f docker-compose.worker.yml logs | grep -i redis
Expected output:
redis_publisher_connected
chat_queue_initialized
deep_research_queue_initialized
Verify Redis URL:
docker-compose -f docker-compose.worker.yml exec worker env | grep REDIS

Uneven Load Distribution

Symptom: Some workers process many jobs, others idle. Cause: Different worker start times or concurrency settings. Fix: Ensure all workers have identical configuration:
# Restart all workers simultaneously
docker-compose -f docker-compose.worker.yml down
docker-compose -f docker-compose.worker.yml up -d --scale worker=4

Memory Leaks

Monitor memory over time:
docker stats --no-stream
Implement periodic restarts:
# Cron job: Rolling restart every 24 hours
0 3 * * * /opt/bioagents/rolling-restart.sh
rolling-restart.sh
#!/bin/bash
# Restart workers one at a time to maintain capacity

for i in {1..4}; do
  echo "Restarting worker $i..."
  docker-compose -f docker-compose.worker.yml restart worker_$i
  sleep 60  # Wait 1 minute between restarts
done

Best Practices

  • Run at least 2 workers for redundancy
  • Use managed Redis with automatic failover
  • Configure stop_grace_period for graceful shutdown
  • Monitor queue depth and scale accordingly
  • Set up alerts for queue health
  • Test worker failure scenarios
  • Document scaling procedures
  • Use infrastructure as code (Terraform, CloudFormation)
  • Implement auto-scaling for cost optimization
  • Regular load testing to validate capacity

Next Steps

Job Queue

Learn about BullMQ architecture and configuration

Docker Setup

Deploy with docker-compose

Build docs developers (and LLMs) love