Skip to main content

Overview

BioAgents implements per-user rate limiting using Redis sliding window algorithm. Rate limits prevent abuse and ensure fair resource allocation across users.
Rate limiting is only active when USE_JOB_QUEUE=true (requires Redis). When disabled, all requests are allowed.

Configuration

Environment Variables

Configure rate limits in your .env file:
# Job Queue (required for rate limiting)
USE_JOB_QUEUE=true
REDIS_URL=redis://localhost:6379

# Rate Limits (optional, defaults shown)
CHAT_RATE_LIMIT_PER_MINUTE=10
DEEP_RESEARCH_RATE_LIMIT_PER_5MIN=3

Default Limits

If not specified, these defaults are used:
// src/middleware/rateLimiter.ts
const RATE_LIMITS: Record<string, RateLimitConfig> = {
  chat: {
    max: parseInt(process.env.CHAT_RATE_LIMIT_PER_MINUTE || "10"),
    window: 60, // 1 minute
  },
  "deep-research": {
    max: parseInt(process.env.DEEP_RESEARCH_RATE_LIMIT_PER_5MIN || "3"),
    window: 300, // 5 minutes
  },
};

How It Works

Sliding Window Algorithm

BioAgents uses Redis sorted sets to implement a sliding window:
  1. Each request is stored with timestamp as score
  2. Old entries outside the window are removed
  3. Current request count is checked against limit
  4. If under limit, request is allowed and recorded
  5. If over limit, request is rejected with 429 status
// Simplified algorithm
const key = `ratelimit:${action}:${userId}`;
const now = Math.floor(Date.now() / 1000);
const windowStart = now - config.window;

// Remove old entries
await redis.zremrangebyscore(key, 0, windowStart);

// Count current requests
const currentCount = await redis.zcard(key);

if (currentCount >= config.max) {
  // Rate limit exceeded
  return { allowed: false, remaining: 0 };
}

// Add current request
await redis.zadd(key, now, `${now}-${Math.random()}`);
return { allowed: true, remaining: config.max - currentCount - 1 };

Advantages of Sliding Window

  • Precise: Tracks requests per second, not fixed time blocks
  • Fair: No burst allowance at window boundaries
  • Efficient: O(log N) Redis operations
  • Scalable: Works across multiple API servers

Using Rate Limit Middleware

Basic Usage

Apply to Elysia routes:
import { Elysia } from "elysia";
import { authResolver } from "../middleware/authResolver";
import { rateLimitMiddleware } from "../middleware/rateLimiter";

const app = new Elysia()
  .guard(
    {
      beforeHandle: [
        authResolver({ required: true }),  // Must run first
        rateLimitMiddleware("chat"),
      ],
    },
    (app) => app.post("/api/chat", chatHandler)
  );
Important: authResolver must run before rateLimitMiddleware because rate limiting requires request.auth.userId.

Deep Research Example

const app = new Elysia()
  .guard(
    {
      beforeHandle: [
        authResolver({ required: true }),
        rateLimitMiddleware("deep-research"),
      ],
    },
    (app) => app.post("/api/deep-research/start", deepResearchHandler)
  );

Rate Limit Response

HTTP Headers

All responses include rate limit headers:
X-RateLimit-Limit: 10
X-RateLimit-Remaining: 7
X-RateLimit-Reset: 60
  • X-RateLimit-Limit: Maximum requests allowed
  • X-RateLimit-Remaining: Requests remaining in current window
  • X-RateLimit-Reset: Seconds until window resets

429 Response

When rate limit is exceeded:
HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 10
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 45

{
  "error": "Rate limit exceeded",
  "message": "Too many requests. Try again in 45 seconds.",
  "retryAfter": 45
}

Programmatic Rate Checking

You can check rate limits without incrementing the counter:
import { checkRateLimit } from "../middleware/rateLimiter";

const result = await checkRateLimit(userId, "chat");

if (result.allowed) {
  console.log(`Remaining: ${result.remaining}`);
} else {
  console.log(`Rate limited. Reset in ${result.resetIn}s`);
}

RateLimitResult Interface

interface RateLimitResult {
  allowed: boolean;      // Whether request is allowed
  remaining: number;     // Requests remaining
  resetIn: number;       // Seconds until reset
}

Custom Rate Limits

Adding New Action Types

Extend the rate limit configuration:
// src/middleware/rateLimiter.ts
const RATE_LIMITS: Record<string, RateLimitConfig> = {
  chat: {
    max: parseInt(process.env.CHAT_RATE_LIMIT_PER_MINUTE || "10"),
    window: 60,
  },
  "deep-research": {
    max: parseInt(process.env.DEEP_RESEARCH_RATE_LIMIT_PER_5MIN || "3"),
    window: 300,
  },
  // Add custom action
  "data-analysis": {
    max: parseInt(process.env.DATA_ANALYSIS_RATE_LIMIT_PER_HOUR || "20"),
    window: 3600, // 1 hour
  },
};
Update the type:
export type RateLimitAction = "chat" | "deep-research" | "data-analysis";

export function rateLimitMiddleware(action: RateLimitAction) {
  // ...
}

Environment Variables

Add to .env:
DATA_ANALYSIS_RATE_LIMIT_PER_HOUR=20

Usage

const app = new Elysia()
  .guard(
    {
      beforeHandle: [
        authResolver({ required: true }),
        rateLimitMiddleware("data-analysis"),
      ],
    },
    (app) => app.post("/api/analyze", analyzeHandler)
  );

Per-Route Rate Limits

Apply different limits to different routes:
const app = new Elysia()
  // Stricter limit for expensive operations
  .guard(
    {
      beforeHandle: [
        authResolver({ required: true }),
        rateLimitMiddleware("deep-research"),  // 3 per 5 min
      ],
    },
    (app) => app.post("/api/deep-research/start", deepResearchHandler)
  )
  // More lenient for cheap operations
  .guard(
    {
      beforeHandle: [
        authResolver({ required: true }),
        rateLimitMiddleware("chat"),  // 10 per min
      ],
    },
    (app) => app.post("/api/chat", chatHandler)
  );

Bypassing Rate Limits

Conditional Bypass

Skip rate limiting when job queue is disabled:
// Rate limiter automatically bypasses when USE_JOB_QUEUE=false
if (!isJobQueueEnabled()) {
  return {
    allowed: true,
    remaining: 999,
    resetIn: 0,
  };
}

Admin/Whitelist Bypass

Implement custom bypass logic:
export function rateLimitMiddleware(action: RateLimitAction) {
  return async ({ request, set }: { request: Request; set: any }) => {
    const auth = (request as any).auth;
    
    // Skip rate limit for admin users
    if (auth?.role === "admin") {
      return;
    }
    
    // Check whitelist
    const whitelistedUsers = [
      "user-uuid-1",
      "user-uuid-2",
    ];
    
    if (whitelistedUsers.includes(auth?.userId)) {
      return;
    }
    
    // Normal rate limit check
    const result = await checkRateLimit(auth.userId, action);
    
    if (!result.allowed) {
      set.status = 429;
      return {
        error: "Rate limit exceeded",
        message: `Too many requests. Try again in ${result.resetIn} seconds.`,
        retryAfter: result.resetIn,
      };
    }
  };
}

Error Handling

Rate limiter gracefully handles Redis failures:
try {
  // Redis operations
  const results = await multi.exec();
  // ...
} catch (error) {
  // On Redis error, allow request but log warning
  logger.error({ error, userId, action }, "rate_limit_check_failed");
  return {
    allowed: true,
    remaining: 999,
    resetIn: 0,
  };
}
Fail-Open Design: If Redis is unavailable, requests are allowed to prevent service outages. Monitor Redis health to catch issues.

Monitoring

Structured Logging

Rate limit events are logged:
// Request allowed
logger.info(
  {
    userId,
    action,
    currentCount: 5,
    max: 10,
    remaining: 5,
  },
  "rate_limit_checked"
);

// Rate limit exceeded
logger.warn(
  {
    userId,
    action,
    currentCount: 10,
    max: 10,
    resetIn: 45,
  },
  "rate_limit_exceeded"
);

// Redis error
logger.error(
  { error, userId, action },
  "rate_limit_check_failed"
);

Metrics Tracking

Track rate limit metrics:
import { checkRateLimit } from "../middleware/rateLimiter";

// Check all users' rate limits
async function getRateLimitMetrics(userIds: string[]) {
  const metrics = await Promise.all(
    userIds.map(async (userId) => {
      const chatLimit = await checkRateLimit(userId, "chat");
      const researchLimit = await checkRateLimit(userId, "deep-research");
      
      return {
        userId,
        chat: {
          remaining: chatLimit.remaining,
          allowed: chatLimit.allowed,
        },
        research: {
          remaining: researchLimit.remaining,
          allowed: researchLimit.allowed,
        },
      };
    })
  );
  
  return metrics;
}

Client-Side Handling

Respecting Rate Limits

interface RateLimitInfo {
  limit: number;
  remaining: number;
  resetIn: number;
}

let rateLimitInfo: RateLimitInfo | null = null;

async function makeRequest(url: string, body: any) {
  // Check if rate limited
  if (rateLimitInfo && rateLimitInfo.remaining === 0) {
    throw new Error(
      `Rate limited. Try again in ${rateLimitInfo.resetIn} seconds.`
    );
  }
  
  const response = await fetch(url, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify(body),
  });
  
  // Update rate limit info from headers
  rateLimitInfo = {
    limit: parseInt(response.headers.get('X-RateLimit-Limit') || '0'),
    remaining: parseInt(response.headers.get('X-RateLimit-Remaining') || '0'),
    resetIn: parseInt(response.headers.get('X-RateLimit-Reset') || '0'),
  };
  
  // Handle 429
  if (response.status === 429) {
    const error = await response.json();
    throw new Error(error.message);
  }
  
  return response.json();
}

Exponential Backoff

async function makeRequestWithRetry(
  url: string,
  body: any,
  maxRetries = 3
) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await makeRequest(url, body);
    } catch (error) {
      if (error.message.includes('Rate limited') && i < maxRetries - 1) {
        // Exponential backoff
        const delay = Math.min(1000 * Math.pow(2, i), 30000);
        await new Promise(resolve => setTimeout(resolve, delay));
        continue;
      }
      throw error;
    }
  }
}

Best Practices

Choose limits based on:
  • Resource cost (LLM tokens, API calls)
  • Expected user behavior
  • Server capacity
Example:
  • Cheap operations: 60 per minute
  • Medium operations: 10 per minute
  • Expensive operations: 3 per 5 minutes
Return rate limit headers on ALL responses, not just 429s. Clients need this info to avoid hitting limits.
Consider different limits for different user tiers:
const limits = {
  free: { max: 10, window: 60 },
  pro: { max: 100, window: 60 },
  enterprise: { max: 1000, window: 60 },
};

const config = limits[user.tier] || limits.free;
Rate limiting depends on Redis. Set up monitoring:
  • Redis connection status
  • Redis memory usage
  • Rate limit check failures
Clearly document rate limits in your API documentation so clients know what to expect.

Testing Rate Limits

Manual Testing

# Make rapid requests
for i in {1..15}; do
  curl -X POST http://localhost:3000/api/chat \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer <token>" \
    -d '{"message": "Hello"}' \
    -i | grep -E "X-RateLimit|429"
  sleep 1
done

Unit Tests

import { describe, test, expect } from "bun:test";
import { checkRateLimit } from "./rateLimiter";

describe("Rate Limiter", () => {
  test("should allow requests under limit", async () => {
    const userId = "test-user-1";
    
    for (let i = 0; i < 10; i++) {
      const result = await checkRateLimit(userId, "chat");
      expect(result.allowed).toBe(true);
      expect(result.remaining).toBe(9 - i);
    }
  });
  
  test("should block requests over limit", async () => {
    const userId = "test-user-2";
    
    // Use up the limit
    for (let i = 0; i < 10; i++) {
      await checkRateLimit(userId, "chat");
    }
    
    // Next request should be blocked
    const result = await checkRateLimit(userId, "chat");
    expect(result.allowed).toBe(false);
    expect(result.remaining).toBe(0);
  });
});

Next Steps

Payment Protocols

Combine rate limiting with payment gating

WebSockets

Rate limit WebSocket connections and messages

Authentication

Learn about auth requirements for rate limiting

Job Queue

Understand Redis and BullMQ setup

Build docs developers (and LLMs) love