Rate Limiting - BioAgents

Overview

BioAgents implements per-user rate limiting using Redis sliding window algorithm. Rate limits prevent abuse and ensure fair resource allocation across users.

Rate limiting is only active when USE_JOB_QUEUE=true (requires Redis). When disabled, all requests are allowed.

Configuration

Environment Variables

Configure rate limits in your .env file:

# Job Queue (required for rate limiting)
USE_JOB_QUEUE=true
REDIS_URL=redis://localhost:6379

# Rate Limits (optional, defaults shown)
CHAT_RATE_LIMIT_PER_MINUTE=10
DEEP_RESEARCH_RATE_LIMIT_PER_5MIN=3

Default Limits

If not specified, these defaults are used:

// src/middleware/rateLimiter.ts
const RATE_LIMITS: Record<string, RateLimitConfig> = {
  chat: {
    max: parseInt(process.env.CHAT_RATE_LIMIT_PER_MINUTE || "10"),
    window: 60, // 1 minute
  },
  "deep-research": {
    max: parseInt(process.env.DEEP_RESEARCH_RATE_LIMIT_PER_5MIN || "3"),
    window: 300, // 5 minutes
  },
};

How It Works

Sliding Window Algorithm

BioAgents uses Redis sorted sets to implement a sliding window:

Each request is stored with timestamp as score
Old entries outside the window are removed
Current request count is checked against limit
If under limit, request is allowed and recorded
If over limit, request is rejected with 429 status

// Simplified algorithm
const key = `ratelimit:${action}:${userId}`;
const now = Math.floor(Date.now() / 1000);
const windowStart = now - config.window;

// Remove old entries
await redis.zremrangebyscore(key, 0, windowStart);

// Count current requests
const currentCount = await redis.zcard(key);

if (currentCount >= config.max) {
  // Rate limit exceeded
  return { allowed: false, remaining: 0 };
}

// Add current request
await redis.zadd(key, now, `${now}-${Math.random()}`);
return { allowed: true, remaining: config.max - currentCount - 1 };

Advantages of Sliding Window

Precise: Tracks requests per second, not fixed time blocks
Fair: No burst allowance at window boundaries
Efficient: O(log N) Redis operations
Scalable: Works across multiple API servers

Using Rate Limit Middleware

Basic Usage

Apply to Elysia routes:

import { Elysia } from "elysia";
import { authResolver } from "../middleware/authResolver";
import { rateLimitMiddleware } from "../middleware/rateLimiter";

const app = new Elysia()
  .guard(
    {
      beforeHandle: [
        authResolver({ required: true }),  // Must run first
        rateLimitMiddleware("chat"),
      ],
    },
    (app) => app.post("/api/chat", chatHandler)
  );

Important: authResolver must run before rateLimitMiddleware because rate limiting requires request.auth.userId.

Deep Research Example

const app = new Elysia()
  .guard(
    {
      beforeHandle: [
        authResolver({ required: true }),
        rateLimitMiddleware("deep-research"),
      ],
    },
    (app) => app.post("/api/deep-research/start", deepResearchHandler)
  );

Rate Limit Response

HTTP Headers

All responses include rate limit headers:

X-RateLimit-Limit: 10
X-RateLimit-Remaining: 7
X-RateLimit-Reset: 60

X-RateLimit-Limit: Maximum requests allowed
X-RateLimit-Remaining: Requests remaining in current window
X-RateLimit-Reset: Seconds until window resets

429 Response

When rate limit is exceeded:

HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 10
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 45

{
  "error": "Rate limit exceeded",
  "message": "Too many requests. Try again in 45 seconds.",
  "retryAfter": 45
}

Programmatic Rate Checking

You can check rate limits without incrementing the counter:

import { checkRateLimit } from "../middleware/rateLimiter";

const result = await checkRateLimit(userId, "chat");

if (result.allowed) {
  console.log(`Remaining: ${result.remaining}`);
} else {
  console.log(`Rate limited. Reset in ${result.resetIn}s`);
}

RateLimitResult Interface

interface RateLimitResult {
  allowed: boolean;      // Whether request is allowed
  remaining: number;     // Requests remaining
  resetIn: number;       // Seconds until reset
}

Custom Rate Limits

Adding New Action Types

Extend the rate limit configuration:

// src/middleware/rateLimiter.ts
const RATE_LIMITS: Record<string, RateLimitConfig> = {
  chat: {
    max: parseInt(process.env.CHAT_RATE_LIMIT_PER_MINUTE || "10"),
    window: 60,
  },
  "deep-research": {
    max: parseInt(process.env.DEEP_RESEARCH_RATE_LIMIT_PER_5MIN || "3"),
    window: 300,
  },
  // Add custom action
  "data-analysis": {
    max: parseInt(process.env.DATA_ANALYSIS_RATE_LIMIT_PER_HOUR || "20"),
    window: 3600, // 1 hour
  },
};

Update the type:

export type RateLimitAction = "chat" | "deep-research" | "data-analysis";

export function rateLimitMiddleware(action: RateLimitAction) {
  // ...
}

Environment Variables

Add to .env:

DATA_ANALYSIS_RATE_LIMIT_PER_HOUR=20

Usage

const app = new Elysia()
  .guard(
    {
      beforeHandle: [
        authResolver({ required: true }),
        rateLimitMiddleware("data-analysis"),
      ],
    },
    (app) => app.post("/api/analyze", analyzeHandler)
  );

Per-Route Rate Limits

Apply different limits to different routes:

const app = new Elysia()
  // Stricter limit for expensive operations
  .guard(
    {
      beforeHandle: [
        authResolver({ required: true }),
        rateLimitMiddleware("deep-research"),  // 3 per 5 min
      ],
    },
    (app) => app.post("/api/deep-research/start", deepResearchHandler)
  )
  // More lenient for cheap operations
  .guard(
    {
      beforeHandle: [
        authResolver({ required: true }),
        rateLimitMiddleware("chat"),  // 10 per min
      ],
    },
    (app) => app.post("/api/chat", chatHandler)
  );

Bypassing Rate Limits

Conditional Bypass

Skip rate limiting when job queue is disabled:

// Rate limiter automatically bypasses when USE_JOB_QUEUE=false
if (!isJobQueueEnabled()) {
  return {
    allowed: true,
    remaining: 999,
    resetIn: 0,
  };
}

Admin/Whitelist Bypass

Implement custom bypass logic:

export function rateLimitMiddleware(action: RateLimitAction) {
  return async ({ request, set }: { request: Request; set: any }) => {
    const auth = (request as any).auth;
    
    // Skip rate limit for admin users
    if (auth?.role === "admin") {
      return;
    }
    
    // Check whitelist
    const whitelistedUsers = [
      "user-uuid-1",
      "user-uuid-2",
    ];
    
    if (whitelistedUsers.includes(auth?.userId)) {
      return;
    }
    
    // Normal rate limit check
    const result = await checkRateLimit(auth.userId, action);
    
    if (!result.allowed) {
      set.status = 429;
      return {
        error: "Rate limit exceeded",
        message: `Too many requests. Try again in ${result.resetIn} seconds.`,
        retryAfter: result.resetIn,
      };
    }
  };
}

Error Handling

Rate limiter gracefully handles Redis failures:

try {
  // Redis operations
  const results = await multi.exec();
  // ...
} catch (error) {
  // On Redis error, allow request but log warning
  logger.error({ error, userId, action }, "rate_limit_check_failed");
  return {
    allowed: true,
    remaining: 999,
    resetIn: 0,
  };
}

Fail-Open Design: If Redis is unavailable, requests are allowed to prevent service outages. Monitor Redis health to catch issues.

Monitoring

Structured Logging

Rate limit events are logged:

// Request allowed
logger.info(
  {
    userId,
    action,
    currentCount: 5,
    max: 10,
    remaining: 5,
  },
  "rate_limit_checked"
);

// Rate limit exceeded
logger.warn(
  {
    userId,
    action,
    currentCount: 10,
    max: 10,
    resetIn: 45,
  },
  "rate_limit_exceeded"
);

// Redis error
logger.error(
  { error, userId, action },
  "rate_limit_check_failed"
);

Metrics Tracking

Track rate limit metrics:

import { checkRateLimit } from "../middleware/rateLimiter";

// Check all users' rate limits
async function getRateLimitMetrics(userIds: string[]) {
  const metrics = await Promise.all(
    userIds.map(async (userId) => {
      const chatLimit = await checkRateLimit(userId, "chat");
      const researchLimit = await checkRateLimit(userId, "deep-research");
      
      return {
        userId,
        chat: {
          remaining: chatLimit.remaining,
          allowed: chatLimit.allowed,
        },
        research: {
          remaining: researchLimit.remaining,
          allowed: researchLimit.allowed,
        },
      };
    })
  );
  
  return metrics;
}

Client-Side Handling

Respecting Rate Limits

interface RateLimitInfo {
  limit: number;
  remaining: number;
  resetIn: number;
}

let rateLimitInfo: RateLimitInfo | null = null;

async function makeRequest(url: string, body: any) {
  // Check if rate limited
  if (rateLimitInfo && rateLimitInfo.remaining === 0) {
    throw new Error(
      `Rate limited. Try again in ${rateLimitInfo.resetIn} seconds.`
    );
  }
  
  const response = await fetch(url, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify(body),
  });
  
  // Update rate limit info from headers
  rateLimitInfo = {
    limit: parseInt(response.headers.get('X-RateLimit-Limit') || '0'),
    remaining: parseInt(response.headers.get('X-RateLimit-Remaining') || '0'),
    resetIn: parseInt(response.headers.get('X-RateLimit-Reset') || '0'),
  };
  
  // Handle 429
  if (response.status === 429) {
    const error = await response.json();
    throw new Error(error.message);
  }
  
  return response.json();
}

Exponential Backoff

async function makeRequestWithRetry(
  url: string,
  body: any,
  maxRetries = 3
) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await makeRequest(url, body);
    } catch (error) {
      if (error.message.includes('Rate limited') && i < maxRetries - 1) {
        // Exponential backoff
        const delay = Math.min(1000 * Math.pow(2, i), 30000);
        await new Promise(resolve => setTimeout(resolve, delay));
        continue;
      }
      throw error;
    }
  }
}

Best Practices

Set Appropriate Limits

Choose limits based on:

Resource cost (LLM tokens, API calls)
Expected user behavior
Server capacity

Example:

Cheap operations: 60 per minute
Medium operations: 10 per minute
Expensive operations: 3 per 5 minutes

Always Include Headers

Return rate limit headers on ALL responses, not just 429s. Clients need this info to avoid hitting limits.

Use Tiered Limits

Consider different limits for different user tiers:

const limits = {
  free: { max: 10, window: 60 },
  pro: { max: 100, window: 60 },
  enterprise: { max: 1000, window: 60 },
};

const config = limits[user.tier] || limits.free;

Monitor Redis Health

Rate limiting depends on Redis. Set up monitoring:

Redis connection status
Redis memory usage
Rate limit check failures

Document Limits

Clearly document rate limits in your API documentation so clients know what to expect.

Testing Rate Limits

Manual Testing

# Make rapid requests
for i in {1..15}; do
  curl -X POST http://localhost:3000/api/chat \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer <token>" \
    -d '{"message": "Hello"}' \
    -i | grep -E "X-RateLimit|429"
  sleep 1
done

Unit Tests

import { describe, test, expect } from "bun:test";
import { checkRateLimit } from "./rateLimiter";

describe("Rate Limiter", () => {
  test("should allow requests under limit", async () => {
    const userId = "test-user-1";
    
    for (let i = 0; i < 10; i++) {
      const result = await checkRateLimit(userId, "chat");
      expect(result.allowed).toBe(true);
      expect(result.remaining).toBe(9 - i);
    }
  });
  
  test("should block requests over limit", async () => {
    const userId = "test-user-2";
    
    // Use up the limit
    for (let i = 0; i < 10; i++) {
      await checkRateLimit(userId, "chat");
    }
    
    // Next request should be blocked
    const result = await checkRateLimit(userId, "chat");
    expect(result.allowed).toBe(false);
    expect(result.remaining).toBe(0);
  });
});

Next Steps

Payment Protocols

Combine rate limiting with payment gating

WebSockets

Rate limit WebSocket connections and messages

Authentication

Learn about auth requirements for rate limiting

Job Queue

Understand Redis and BullMQ setup

Get Started

Core Concepts

Agents

Configuration

Features

Deployment

Advanced

​Overview

​Configuration

​Environment Variables

​Default Limits

​How It Works

​Sliding Window Algorithm

​Advantages of Sliding Window

​Using Rate Limit Middleware

​Basic Usage

​Deep Research Example

​Rate Limit Response

​HTTP Headers

​429 Response

​Programmatic Rate Checking

​RateLimitResult Interface

​Custom Rate Limits

​Adding New Action Types

​Environment Variables

​Usage

​Per-Route Rate Limits

​Bypassing Rate Limits

​Conditional Bypass

​Admin/Whitelist Bypass

​Error Handling

​Monitoring

​Structured Logging

​Metrics Tracking

​Client-Side Handling

​Respecting Rate Limits

​Exponential Backoff

​Best Practices

​Testing Rate Limits

​Manual Testing

​Unit Tests

​Next Steps

Payment Protocols

WebSockets

Authentication

Job Queue

Build docs developers (and LLMs) love

Overview

Configuration

Environment Variables

Default Limits

How It Works

Sliding Window Algorithm

Advantages of Sliding Window

Using Rate Limit Middleware

Basic Usage

Deep Research Example

Rate Limit Response

HTTP Headers

429 Response

Programmatic Rate Checking

RateLimitResult Interface

Custom Rate Limits

Adding New Action Types

Environment Variables

Usage

Per-Route Rate Limits

Bypassing Rate Limits

Conditional Bypass

Admin/Whitelist Bypass

Error Handling

Monitoring

Structured Logging

Metrics Tracking

Client-Side Handling

Respecting Rate Limits

Exponential Backoff

Best Practices

Testing Rate Limits

Manual Testing

Unit Tests

Next Steps