Skip to main content
Rate limiting protects your chatbot from abuse and controls costs by limiting how many messages users can send. This guide covers the chatbot’s multi-layer rate limiting implementation.

Rate limiting architecture

The chatbot implements three layers of rate limiting:
  1. IP-based rate limiting - Prevents abuse from anonymous users
  2. User-based rate limiting - Enforces message quotas per user type
  3. API gateway limits - Provider-level rate limits (external)

IP-based rate limiting

IP rate limiting uses Redis to track requests by IP address:
lib/ratelimit.ts
import { createClient } from "redis";
import { isProductionEnvironment } from "@/lib/constants";
import { ChatbotError } from "@/lib/errors";

const MAX_MESSAGES_PER_DAY = 10;
const TTL_SECONDS = 60 * 60 * 24;

let client: ReturnType<typeof createClient> | null = null;

function getClient() {
  if (!client && process.env.REDIS_URL) {
    client = createClient({ url: process.env.REDIS_URL });
    client.on("error", () => {});
    client.connect().catch(() => {
      client = null;
    });
  }
  return client;
}

export async function checkIpRateLimit(ip: string | undefined) {
  if (!isProductionEnvironment || !ip) return;

  const redis = getClient();
  if (!redis?.isReady) return;

  try {
    const key = `ip-rate-limit:${ip}`;
    const [count] = await redis
      .multi()
      .incr(key)
      .expire(key, TTL_SECONDS, "NX")
      .exec();

    if (typeof count === "number" && count > MAX_MESSAGES_PER_DAY) {
      throw new ChatbotError("rate_limit:chat");
    }
  } catch (error) {
    if (error instanceof ChatbotError) throw error;
  }
}
IP rate limiting only runs in production (isProductionEnvironment) and gracefully degrades if Redis is unavailable.

How IP rate limiting works

1

Extract IP address

Get the client IP from the request:
app/(chat)/api/chat/route.ts
import { ipAddress } from "@vercel/functions";

await checkIpRateLimit(ipAddress(request));
2

Increment counter

Use Redis MULTI to atomically increment and set expiry:
const key = `ip-rate-limit:${ip}`;
const [count] = await redis
  .multi()
  .incr(key)
  .expire(key, TTL_SECONDS, "NX")
  .exec();
The "NX" flag ensures expiry is only set if it doesn’t exist, preserving the original TTL.
3

Check threshold

Throw a rate limit error if the threshold is exceeded:
if (typeof count === "number" && count > MAX_MESSAGES_PER_DAY) {
  throw new ChatbotError("rate_limit:chat");
}

User-based rate limiting

Authenticated users have message quotas based on their user type:
lib/ai/entitlements.ts
import type { UserType } from "@/app/(auth)/auth";

type Entitlements = {
  maxMessagesPerDay: number;
};

export const entitlementsByUserType: Record<UserType, Entitlements> = {
  guest: {
    maxMessagesPerDay: 10,
  },
  regular: {
    maxMessagesPerDay: 10,
  },
};

Implementing user rate limits

app/(chat)/api/chat/route.ts
import { entitlementsByUserType } from "@/lib/ai/entitlements";
import { getMessageCountByUserId } from "@/lib/db/queries";

const userType: UserType = session.user.type;

const messageCount = await getMessageCountByUserId({
  id: session.user.id,
  differenceInHours: 24,
});

if (messageCount > entitlementsByUserType[userType].maxMessagesPerDay) {
  return new ChatbotError("rate_limit:chat").toResponse();
}
This queries the database to count messages from the user in the last 24 hours:
lib/db/queries.ts
export async function getMessageCountByUserId({
  id,
  differenceInHours,
}: {
  id: string;
  differenceInHours: number;
}) {
  try {
    const twentyFourHoursAgo = new Date(
      Date.now() - differenceInHours * 60 * 60 * 1000
    );

    const [stats] = await db
      .select({ count: count(message.id) })
      .from(message)
      .innerJoin(chat, eq(message.chatId, chat.id))
      .where(
        and(
          eq(chat.userId, id),
          gte(message.createdAt, twentyFourHoursAgo),
          eq(message.role, "user")
        )
      )
      .execute();

    return stats?.count ?? 0;
  } catch (_error) {
    throw new ChatbotError(
      "bad_request:database",
      "Failed to get message count by user id"
    );
  }
}
Database-based rate limiting is less performant than Redis but doesn’t require additional infrastructure. Consider moving to Redis for high-traffic applications.

Configuring rate limits

Modify the constants in lib/ratelimit.ts:
const MAX_MESSAGES_PER_DAY = 10;  // Increase for more generous limits
const TTL_SECONDS = 60 * 60 * 24; // Change window size
Add new user types with different quotas:
lib/ai/entitlements.ts
export const entitlementsByUserType: Record<UserType, Entitlements> = {
  guest: {
    maxMessagesPerDay: 10,
  },
  regular: {
    maxMessagesPerDay: 10,
  },
  premium: {
    maxMessagesPerDay: 100,
  },
  enterprise: {
    maxMessagesPerDay: Infinity,
  },
};
You can implement different limits for different models:
const modelLimits: Record<string, number> = {
  "gpt-4": 5,
  "gpt-4o-mini": 10,
  "claude-4.5-sonnet": 10,
};

const modelMessageCount = await getMessageCountByUserIdAndModel({
  id: session.user.id,
  model: selectedChatModel,
  differenceInHours: 24,
});

if (modelMessageCount > modelLimits[selectedChatModel]) {
  return new ChatbotError("rate_limit:chat").toResponse();
}

Error handling

Rate limit errors use the centralized error system:
lib/errors.ts
export class ChatbotError extends Error {
  type: ErrorType;
  surface: Surface;
  statusCode: number;

  constructor(errorCode: ErrorCode, cause?: string) {
    super();
    const [type, surface] = errorCode.split(":");
    this.type = type as ErrorType;
    this.surface = surface as Surface;
    this.message = getMessageByErrorCode(errorCode);
    this.statusCode = getStatusCodeByType(this.type);
  }

  toResponse() {
    const code: ErrorCode = `${this.type}:${this.surface}`;
    return Response.json(
      { code, message: this.message },
      { status: this.statusCode }
    );
  }
}
The rate limit error returns a 429 status code:
case "rate_limit:chat":
  return "You have exceeded your maximum number of messages for the day. Please try again later.";

Redis setup

To enable IP-based rate limiting, configure Redis:
1

Deploy Redis

Use Vercel KV, Upstash, or any Redis provider:
# Using Vercel KV
vercel env pull
2

Set environment variable

Add REDIS_URL to your environment:
REDIS_URL=redis://default:password@host:port
3

Test connection

The rate limiter will automatically connect when REDIS_URL is present:
function getClient() {
  if (!client && process.env.REDIS_URL) {
    client = createClient({ url: process.env.REDIS_URL });
    client.on("error", () => {});
    client.connect().catch(() => {
      client = null;
    });
  }
  return client;
}
If Redis is unavailable, the rate limiter gracefully degrades and allows requests through. This prevents Redis outages from breaking your chatbot.

Monitoring rate limits

Track rate limit hits and adjust limits accordingly:
export async function checkIpRateLimit(ip: string | undefined) {
  // ...
  try {
    const key = `ip-rate-limit:${ip}`;
    const [count] = await redis
      .multi()
      .incr(key)
      .expire(key, TTL_SECONDS, "NX")
      .exec();

    if (typeof count === "number" && count > MAX_MESSAGES_PER_DAY) {
      console.log(`Rate limit exceeded for IP: ${ip}, count: ${count}`);
      throw new ChatbotError("rate_limit:chat");
    }
  } catch (error) {
    if (error instanceof ChatbotError) throw error;
  }
}
Consider integrating with observability tools like:
  • Vercel Analytics for error tracking
  • DataDog for Redis metrics
  • Sentry for error reporting

Bot detection

The chatbot uses BotID for additional bot protection:
app/(chat)/api/chat/route.ts
import { checkBotId } from "botid/server";

const [botResult, session] = await Promise.all([checkBotId(), auth()]);

if (botResult.isBot) {
  return new ChatbotError("unauthorized:chat").toResponse();
}
This runs in parallel with authentication for better performance.

Advanced patterns

Implement more sophisticated rate limiting with sliding windows:
async function checkSlidingWindow(userId: string) {
  const now = Date.now();
  const windowStart = now - 24 * 60 * 60 * 1000;
  
  const key = `rate-limit:${userId}`;
  
  await redis
    .multi()
    .zremrangebyscore(key, 0, windowStart)
    .zadd(key, now, `${now}`)
    .expire(key, TTL_SECONDS)
    .exec();
  
  const count = await redis.zcard(key);
  
  if (count > MAX_MESSAGES_PER_DAY) {
    throw new ChatbotError("rate_limit:chat");
  }
}
Implement burst allowance with token buckets:
async function checkTokenBucket(userId: string) {
  const key = `bucket:${userId}`;
  const refillRate = 10 / (24 * 60 * 60); // 10 per day
  const bucketSize = 5; // Allow bursts of 5
  
  const data = await redis.get(key);
  const { tokens, lastRefill } = data 
    ? JSON.parse(data) 
    : { tokens: bucketSize, lastRefill: Date.now() };
  
  const now = Date.now();
  const timePassed = (now - lastRefill) / 1000;
  const newTokens = Math.min(
    bucketSize,
    tokens + timePassed * refillRate
  );
  
  if (newTokens < 1) {
    throw new ChatbotError("rate_limit:chat");
  }
  
  await redis.set(
    key,
    JSON.stringify({
      tokens: newTokens - 1,
      lastRefill: now,
    }),
    { ex: TTL_SECONDS }
  );
}
Apply different limits based on user location:
import { geolocation } from "@vercel/functions";

const { country } = geolocation(request);

const limitsByCountry: Record<string, number> = {
  US: 20,
  GB: 20,
  default: 10,
};

const limit = limitsByCountry[country] ?? limitsByCountry.default;

if (messageCount > limit) {
  throw new ChatbotError("rate_limit:chat");
}

Testing rate limits

Test rate limiting in development:
// Temporarily lower limits for testing
const MAX_MESSAGES_PER_DAY = 2; // Instead of 10

// Or disable for development
if (process.env.NODE_ENV === "development") {
  return; // Skip rate limiting
}
Never deploy with rate limiting disabled. Always test with realistic limits in a staging environment.

Next steps

  • Learn about streaming to understand how rate limits interact with long-running requests
  • Review the upgrading guide for rate limit changes in new versions
  • Explore building AI tools that respect rate limits

Build docs developers (and LLMs) love