Skip to main content
Rate limiting controls how many requests a user, API key, IP address, or any identifier can make within a given time window. Unkey provides globally distributed rate limiting that works at the edge without you managing any infrastructure.

Why Rate Limiting?

Prevent abuse

Stop bad actors from hammering your endpoints, scraping data, or launching DDoS attacks

Protect infrastructure costs

Limit expensive operations (AI calls, database queries) before they blow up your bill

Fair usage enforcement

Ensure no single user monopolizes shared resources or degrades service for others

Compliance & SLAs

Enforce contractual limits (“10,000 requests/month on Basic plan”)

How It Works

Unkey’s rate limiting uses a sliding window algorithm for smooth, accurate enforcement:
1

Choose an identifier

Decide what you’re limiting: user ID, API key, IP address, organization ID, or any string that uniquely identifies the requester.
2

Set the limit

Define how many requests are allowed and over what duration. Example: 100 requests per minute.
3

Check on each request

Call limiter.limit(identifier) and Unkey tells you whether to allow or reject the request.

Rate Limiting Approaches

Unkey offers two complementary ways to implement rate limiting:
ApproachBest ForHow It Works
StandaloneAny endpoint, public or privateYou call limiter.limit() with any identifier — works with or without API keys
Key-attachedAPI key authenticated endpointsRate limits are configured per-key and automatically enforced during keys.verify()
Use both! Apply standalone rate limiting to public endpoints (login, signup) and key-attached limits to authenticated API calls.

Standalone Rate Limiting

Protect any endpoint with identifier-based rate limiting.
import { Ratelimit } from "@unkey/ratelimit";

const limiter = new Ratelimit({
  rootKey: process.env.UNKEY_ROOT_KEY,
  namespace: "api",      // Group related limits
  limit: 10,             // 10 requests...
  duration: "60s",       // ...per minute
});

export async function handler(req: Request) {
  // Use any identifier: user ID, IP, session, etc.
  const identifier = req.headers.get("x-user-id") ?? getClientIP(req);
  
  const { success, remaining, reset } = await limiter.limit(identifier);
  
  if (!success) {
    return new Response("Too many requests", {
      status: 429,
      headers: {
        "X-RateLimit-Remaining": "0",
        "X-RateLimit-Reset": reset.toString(),
        "Retry-After": Math.ceil((reset - Date.now()) / 1000).toString()
      }
    });
  }
  
  // Request allowed — continue
  return new Response(`Success! ${remaining} requests remaining.`);
}

Configuration Options

rootKey
string
required
Your Unkey root key with ratelimit.*.limit permission
namespace
string
required
Logical grouping for your rate limits. Separate namespaces are isolated from each other. Examples: "api", "login", "webhooks"
limit
number
required
Maximum number of requests allowed in the duration window
duration
string | number
required
Time window for the limit. String format: "30s", "5m", "1h", "1d". Number format: milliseconds (e.g., 60000 for 1 minute)
timeout
object
Configure behavior when Unkey is unreachable:
timeout: {
  ms: 3000,  // Wait max 3 seconds
  fallback: (identifier) => ({
    success: true,  // Allow on timeout (or false to deny)
    limit: 0,
    remaining: 0,
    reset: Date.now()
  })
}
onError
function
Error handler for network failures:
onError: (err, identifier) => {
  console.error(`Rate limit error for ${identifier}:`, err);
  return { success: true, limit: 0, remaining: 0, reset: Date.now() };
}

Key-Attached Rate Limiting

Configure rate limits directly on API keys — they’re automatically enforced during verification.
import { Unkey } from "@unkey/api";

const unkey = new Unkey({ rootKey: process.env.UNKEY_ROOT_KEY });

try {
  const { meta, data } = await unkey.keys.create({
    apiId: "api_...",
    name: "Free Tier Key",
    ratelimits: [
      {
        name: "requests",
        limit: 100,
        duration: 60000,  // 100 requests per minute
      },
      {
        name: "ai-calls",
        limit: 10,
        duration: 3600000,  // 10 AI calls per hour
      }
    ]
  });
} catch (err) {
  console.error(err);
}

Multiple Rate Limits per Key

Apply different limits to different operation types:
try {
  const { meta, data } = await unkey.keys.create({
    apiId: "api_...",
    ratelimits: [
      {
        name: "requests",
        limit: 1000,
        duration: 60000,  // 1000 general requests/minute
      },
      {
        name: "search",
        limit: 100,
        duration: 60000,  // 100 search queries/minute
      },
      {
        name: "exports",
        limit: 10,
        duration: 3600000,  // 10 exports/hour
      }
    ]
  });
} catch (err) {
  console.error(err);
}
Then check specific limits during verification:
// For a search endpoint
const result = await unkey.keys.verifyKey({
  key: "sk_...",
  ratelimits: [{ name: "search" }]
});

// For an export endpoint
const result = await unkey.keys.verifyKey({
  key: "sk_...",
  ratelimits: [{ name: "exports", cost: 1 }]
});

Algorithms & Architecture

Sliding Window Algorithm

Unkey uses sliding windows to provide smooth rate limiting without the “burst at window reset” problem.
Fixed windows allow burst exploitation:
  • Limit: 100 requests per minute
  • User sends 100 requests at 00:59
  • Window resets at 01:00
  • User sends 100 more at 01:01
  • Result: 200 requests in 2 seconds ❌

Global Consistency

Rate limits are enforced consistently across all regions. A user can’t bypass limits by hitting different geographic endpoints.
See real-time global performance metrics at ratelimit.unkey.com — latency and throughput benchmarks updated live.

Advanced Features

Custom Overrides

Give specific users higher (or lower) limits without code changes.
  1. Go to Ratelimit → Select namespace → Overrides tab
  2. Click Add Override
  3. Enter identifier and custom limits
  4. Changes propagate globally in ~60 seconds
Override form

Per-User vs Per-Endpoint Limits

// Rate limit by user across all endpoints
const { success } = await limiter.limit(`user:${userId}`);
Use when: You want to cap total requests per user regardless of which endpoint they hit.

Cost-Based Limiting

Different operations can consume different amounts from the limit:
// Normal request: costs 1 (default)
await limiter.limit(userId);

// Expensive AI operation: costs 10
await limiter.limit(userId, { cost: 10 });

// With a limit of 100/minute:
// - 100 normal requests, OR
// - 10 expensive requests, OR
// - Mix: 50 normal + 5 expensive

Timeout & Fallback

Configure resilient behavior when Unkey is unreachable:
const limiter = new Ratelimit({
  rootKey: process.env.UNKEY_ROOT_KEY,
  namespace: "api",
  limit: 100,
  duration: "60s",
  timeout: {
    ms: 3000,  // Wait max 3 seconds
    fallback: (identifier) => ({
      success: true,  // Allow on timeout (fail open)
      // OR: success: false  // Deny on timeout (fail closed)
      limit: 0,
      remaining: 0,
      reset: Date.now()
    })
  },
  onError: (err, identifier) => {
    console.error(`Rate limit error for ${identifier}:`, err);
    // Log to monitoring service
    return { success: true, limit: 0, remaining: 0, reset: Date.now() };
  }
});
Fail open (allow on timeout) prioritizes availability over strict enforcement. Fail closed (deny on timeout) prioritizes security over availability. Choose based on your requirements.

Response Format

Every rate limit check returns:
FieldTypeDescription
successbooleantrue if request is allowed
limitnumberThe configured limit
remainingnumberRequests left in current window
resetnumberUnix timestamp (ms) when window resets

Handling Rate Limit Responses

const { success, remaining, reset, limit } = await limiter.limit(identifier);

if (!success) {
  const retryAfter = Math.ceil((reset - Date.now()) / 1000);
  
  return new Response("Rate limit exceeded", {
    status: 429,
    headers: {
      "X-RateLimit-Limit": limit.toString(),
      "X-RateLimit-Remaining": "0",
      "X-RateLimit-Reset": reset.toString(),
      "Retry-After": retryAfter.toString()
    },
    body: JSON.stringify({
      error: "Too many requests",
      retryAfter: retryAfter,
      resetAt: new Date(reset).toISOString()
    })
  });
}

// Include rate limit info in successful responses
return new Response("Success", {
  headers: {
    "X-RateLimit-Limit": limit.toString(),
    "X-RateLimit-Remaining": remaining.toString(),
    "X-RateLimit-Reset": reset.toString()
  }
});

Common Patterns

// Use identifier prefixes to apply different overrides
const planPrefix = user.plan; // "free", "pro", "enterprise"
const identifier = `${planPrefix}:${user.id}`;

// In dashboard, set overrides:
// free:*       → 100/min
// pro:*        → 1000/min
// enterprise:* → 10000/min

const { success } = await limiter.limit(identifier);
// Different limits for different endpoint types
const apiLimiter = new Ratelimit({
  rootKey: process.env.UNKEY_ROOT_KEY,
  namespace: "api",
  limit: 1000,
  duration: "60s"
});

const authLimiter = new Ratelimit({
  rootKey: process.env.UNKEY_ROOT_KEY,
  namespace: "auth",
  limit: 10,
  duration: "60s"
});
// Slower responses as users approach limit
const { success, remaining, limit } = await limiter.limit(userId);

if (success) {
  const percentUsed = (limit - remaining) / limit;
  
  if (percentUsed > 0.9) {
    // >90% used: add 500ms delay
    await new Promise(r => setTimeout(r, 500));
  } else if (percentUsed > 0.75) {
    // >75% used: add 200ms delay
    await new Promise(r => setTimeout(r, 200));
  }
}
// Allow short bursts but limit sustained rate
const shortTerm = new Ratelimit({
  namespace: "burst",
  limit: 20,
  duration: "1s"  // 20 requests/second
});

const longTerm = new Ratelimit({
  namespace: "sustained",
  limit: 1000,
  duration: "60s"  // 1000 requests/minute
});

// Check both
const [burst, sustained] = await Promise.all([
  shortTerm.limit(userId),
  longTerm.limit(userId)
]);

if (!burst.success || !sustained.success) {
  return new Response("Rate limit exceeded", { status: 429 });
}

Best Practices

Choose appropriate windows

  • Seconds: Real-time APIs, live updates
  • Minutes: Standard APIs, search
  • Hours: Expensive operations, AI calls
  • Days: Free tier quotas, trial limits

Return helpful headers

Always include X-RateLimit-* headers so clients know their limit status and when to retry.

Use multiple namespaces

Separate rate limits for different endpoint categories (auth, api, webhooks) for better control.

Monitor and adjust

Watch analytics to see which identifiers are hitting limits. Adjust thresholds based on real usage.

Combine with usage limits

Use rate limits for frequency control and usage limits (credits) for total volume quotas.

Implement fallback behavior

Configure timeout and error handlers to maintain availability during network issues.

Next Steps

Identities

Share rate limits across multiple keys per user

Analytics

Track rate limit violations and usage patterns

API Reference

Complete rate limiting API documentation

Quickstart

Framework-specific implementation guides

Build docs developers (and LLMs) love