Rate limiting

Rate limiting protects your chatbot from abuse and controls costs by limiting how many messages users can send. This guide covers the chatbot’s multi-layer rate limiting implementation.

Rate limiting architecture

The chatbot implements three layers of rate limiting:

IP-based rate limiting - Prevents abuse from anonymous users
User-based rate limiting - Enforces message quotas per user type
API gateway limits - Provider-level rate limits (external)

IP-based rate limiting

IP rate limiting uses Redis to track requests by IP address:

lib/ratelimit.ts

import { createClient } from "redis";
import { isProductionEnvironment } from "@/lib/constants";
import { ChatbotError } from "@/lib/errors";

const MAX_MESSAGES_PER_DAY = 10;
const TTL_SECONDS = 60 * 60 * 24;

let client: ReturnType<typeof createClient> | null = null;

function getClient() {
  if (!client && process.env.REDIS_URL) {
    client = createClient({ url: process.env.REDIS_URL });
    client.on("error", () => {});
    client.connect().catch(() => {
      client = null;
    });
  }
  return client;
}

export async function checkIpRateLimit(ip: string | undefined) {
  if (!isProductionEnvironment || !ip) return;

  const redis = getClient();
  if (!redis?.isReady) return;

  try {
    const key = `ip-rate-limit:${ip}`;
    const [count] = await redis
      .multi()
      .incr(key)
      .expire(key, TTL_SECONDS, "NX")
      .exec();

    if (typeof count === "number" && count > MAX_MESSAGES_PER_DAY) {
      throw new ChatbotError("rate_limit:chat");
    }
  } catch (error) {
    if (error instanceof ChatbotError) throw error;
  }
}

IP rate limiting only runs in production (isProductionEnvironment) and gracefully degrades if Redis is unavailable.

How IP rate limiting works

Extract IP address

Get the client IP from the request:

app/(chat)/api/chat/route.ts

import { ipAddress } from "@vercel/functions";

await checkIpRateLimit(ipAddress(request));

Increment counter

Use Redis MULTI to atomically increment and set expiry:

const key = `ip-rate-limit:${ip}`;
const [count] = await redis
  .multi()
  .incr(key)
  .expire(key, TTL_SECONDS, "NX")
  .exec();

The "NX" flag ensures expiry is only set if it doesn’t exist, preserving the original TTL.

Check threshold

Throw a rate limit error if the threshold is exceeded:

if (typeof count === "number" && count > MAX_MESSAGES_PER_DAY) {
  throw new ChatbotError("rate_limit:chat");
}

User-based rate limiting

Authenticated users have message quotas based on their user type:

lib/ai/entitlements.ts

import type { UserType } from "@/app/(auth)/auth";

type Entitlements = {
  maxMessagesPerDay: number;
};

export const entitlementsByUserType: Record<UserType, Entitlements> = {
  guest: {
    maxMessagesPerDay: 10,
  },
  regular: {
    maxMessagesPerDay: 10,
  },
};

Implementing user rate limits

app/(chat)/api/chat/route.ts

import { entitlementsByUserType } from "@/lib/ai/entitlements";
import { getMessageCountByUserId } from "@/lib/db/queries";

const userType: UserType = session.user.type;

const messageCount = await getMessageCountByUserId({
  id: session.user.id,
  differenceInHours: 24,
});

if (messageCount > entitlementsByUserType[userType].maxMessagesPerDay) {
  return new ChatbotError("rate_limit:chat").toResponse();
}

This queries the database to count messages from the user in the last 24 hours:

lib/db/queries.ts

export async function getMessageCountByUserId({
  id,
  differenceInHours,
}: {
  id: string;
  differenceInHours: number;
}) {
  try {
    const twentyFourHoursAgo = new Date(
      Date.now() - differenceInHours * 60 * 60 * 1000
    );

    const [stats] = await db
      .select({ count: count(message.id) })
      .from(message)
      .innerJoin(chat, eq(message.chatId, chat.id))
      .where(
        and(
          eq(chat.userId, id),
          gte(message.createdAt, twentyFourHoursAgo),
          eq(message.role, "user")
        )
      )
      .execute();

    return stats?.count ?? 0;
  } catch (_error) {
    throw new ChatbotError(
      "bad_request:database",
      "Failed to get message count by user id"
    );
  }
}

Database-based rate limiting is less performant than Redis but doesn’t require additional infrastructure. Consider moving to Redis for high-traffic applications.

Configuring rate limits

Adjusting IP limits

Modify the constants in lib/ratelimit.ts:

const MAX_MESSAGES_PER_DAY = 10;  // Increase for more generous limits
const TTL_SECONDS = 60 * 60 * 24; // Change window size

Adding user tiers

Add new user types with different quotas:

lib/ai/entitlements.ts

export const entitlementsByUserType: Record<UserType, Entitlements> = {
  guest: {
    maxMessagesPerDay: 10,
  },
  regular: {
    maxMessagesPerDay: 10,
  },
  premium: {
    maxMessagesPerDay: 100,
  },
  enterprise: {
    maxMessagesPerDay: Infinity,
  },
};

Per-model rate limits

You can implement different limits for different models:

const modelLimits: Record<string, number> = {
  "gpt-4": 5,
  "gpt-4o-mini": 10,
  "claude-4.5-sonnet": 10,
};

const modelMessageCount = await getMessageCountByUserIdAndModel({
  id: session.user.id,
  model: selectedChatModel,
  differenceInHours: 24,
});

if (modelMessageCount > modelLimits[selectedChatModel]) {
  return new ChatbotError("rate_limit:chat").toResponse();
}

Error handling

Rate limit errors use the centralized error system:

lib/errors.ts

export class ChatbotError extends Error {
  type: ErrorType;
  surface: Surface;
  statusCode: number;

  constructor(errorCode: ErrorCode, cause?: string) {
    super();
    const [type, surface] = errorCode.split(":");
    this.type = type as ErrorType;
    this.surface = surface as Surface;
    this.message = getMessageByErrorCode(errorCode);
    this.statusCode = getStatusCodeByType(this.type);
  }

  toResponse() {
    const code: ErrorCode = `${this.type}:${this.surface}`;
    return Response.json(
      { code, message: this.message },
      { status: this.statusCode }
    );
  }
}

The rate limit error returns a 429 status code:

case "rate_limit:chat":
  return "You have exceeded your maximum number of messages for the day. Please try again later.";

Redis setup

To enable IP-based rate limiting, configure Redis:

Deploy Redis

Use Vercel KV, Upstash, or any Redis provider:

# Using Vercel KV
vercel env pull

Set environment variable

Add REDIS_URL to your environment:

REDIS_URL=redis://default:password@host:port

Test connection

The rate limiter will automatically connect when REDIS_URL is present:

function getClient() {
  if (!client && process.env.REDIS_URL) {
    client = createClient({ url: process.env.REDIS_URL });
    client.on("error", () => {});
    client.connect().catch(() => {
      client = null;
    });
  }
  return client;
}

If Redis is unavailable, the rate limiter gracefully degrades and allows requests through. This prevents Redis outages from breaking your chatbot.

Monitoring rate limits

Track rate limit hits and adjust limits accordingly:

export async function checkIpRateLimit(ip: string | undefined) {
  // ...
  try {
    const key = `ip-rate-limit:${ip}`;
    const [count] = await redis
      .multi()
      .incr(key)
      .expire(key, TTL_SECONDS, "NX")
      .exec();

    if (typeof count === "number" && count > MAX_MESSAGES_PER_DAY) {
      console.log(`Rate limit exceeded for IP: ${ip}, count: ${count}`);
      throw new ChatbotError("rate_limit:chat");
    }
  } catch (error) {
    if (error instanceof ChatbotError) throw error;
  }
}

Consider integrating with observability tools like:

Vercel Analytics for error tracking
DataDog for Redis metrics
Sentry for error reporting

Bot detection

The chatbot uses BotID for additional bot protection:

app/(chat)/api/chat/route.ts

import { checkBotId } from "botid/server";

const [botResult, session] = await Promise.all([checkBotId(), auth()]);

if (botResult.isBot) {
  return new ChatbotError("unauthorized:chat").toResponse();
}

This runs in parallel with authentication for better performance.

Advanced patterns

Sliding window rate limiting

Implement more sophisticated rate limiting with sliding windows:

async function checkSlidingWindow(userId: string) {
  const now = Date.now();
  const windowStart = now - 24 * 60 * 60 * 1000;
  
  const key = `rate-limit:${userId}`;
  
  await redis
    .multi()
    .zremrangebyscore(key, 0, windowStart)
    .zadd(key, now, `${now}`)
    .expire(key, TTL_SECONDS)
    .exec();
  
  const count = await redis.zcard(key);
  
  if (count > MAX_MESSAGES_PER_DAY) {
    throw new ChatbotError("rate_limit:chat");
  }
}

Token bucket algorithm

Implement burst allowance with token buckets:

async function checkTokenBucket(userId: string) {
  const key = `bucket:${userId}`;
  const refillRate = 10 / (24 * 60 * 60); // 10 per day
  const bucketSize = 5; // Allow bursts of 5
  
  const data = await redis.get(key);
  const { tokens, lastRefill } = data 
    ? JSON.parse(data) 
    : { tokens: bucketSize, lastRefill: Date.now() };
  
  const now = Date.now();
  const timePassed = (now - lastRefill) / 1000;
  const newTokens = Math.min(
    bucketSize,
    tokens + timePassed * refillRate
  );
  
  if (newTokens < 1) {
    throw new ChatbotError("rate_limit:chat");
  }
  
  await redis.set(
    key,
    JSON.stringify({
      tokens: newTokens - 1,
      lastRefill: now,
    }),
    { ex: TTL_SECONDS }
  );
}

Geolocation-based limits

Apply different limits based on user location:

import { geolocation } from "@vercel/functions";

const { country } = geolocation(request);

const limitsByCountry: Record<string, number> = {
  US: 20,
  GB: 20,
  default: 10,
};

const limit = limitsByCountry[country] ?? limitsByCountry.default;

if (messageCount > limit) {
  throw new ChatbotError("rate_limit:chat");
}

Testing rate limits

Test rate limiting in development:

// Temporarily lower limits for testing
const MAX_MESSAGES_PER_DAY = 2; // Instead of 10

// Or disable for development
if (process.env.NODE_ENV === "development") {
  return; // Skip rate limiting
}

Never deploy with rate limiting disabled. Always test with realistic limits in a staging environment.

Next steps

Learn about streaming to understand how rate limits interact with long-running requests
Review the upgrading guide for rate limit changes in new versions
Explore building AI tools that respect rate limits

Migration Guides

Advanced

Rate limiting architecture

IP-based rate limiting

How IP rate limiting works

User-based rate limiting

Implementing user rate limits

Configuring rate limits

Error handling

Redis setup

Monitoring rate limits

Bot detection

Advanced patterns

Testing rate limits

Next steps

Build docs developers (and LLMs) love

Migration Guides

Advanced

​Rate limiting architecture

​IP-based rate limiting

​How IP rate limiting works

​User-based rate limiting

​Implementing user rate limits

​Configuring rate limits

​Error handling

​Redis setup

​Monitoring rate limits

​Bot detection

​Advanced patterns

​Testing rate limits

​Next steps

Build docs developers (and LLMs) love

Rate limiting architecture

IP-based rate limiting

How IP rate limiting works

User-based rate limiting

Implementing user rate limits

Configuring rate limits

Error handling

Redis setup

Monitoring rate limits

Bot detection

Advanced patterns

Testing rate limits

Next steps