Skip to main content

Overview

Permission Mongo implements rate limiting to protect the API from abuse and ensure fair resource allocation across all tenants. Rate limits are enforced per tenant and per endpoint.
Rate limiting is currently implemented at the application level. For production deployments with multiple instances, consider using a distributed rate limiter with Redis.

Rate Limit Headers

The API includes rate limit information in response headers:
X-RateLimit-Limit
string
Maximum number of requests allowed in the current time window
X-RateLimit-Remaining
string
Number of requests remaining in the current time window
X-RateLimit-Reset
string
Unix timestamp (seconds) when the rate limit resets

Example Response Headers

curl -v "http://localhost:8080/users" \
  -H "Authorization: Bearer YOUR_TOKEN"
HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
X-Request-ID: a1b2c3d4e5f6
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 999
X-RateLimit-Reset: 1735689600

Rate Limit Configuration

Configure rate limits in your config.yaml:
server:
  rate_limit:
    enabled: true
    requests_per_second: 100
    burst_size: 200
    per_tenant: true
enabled
boolean
default:"true"
Enable or disable rate limiting
requests_per_second
integer
default:"100"
Average requests allowed per second
burst_size
integer
default:"200"
Maximum burst of requests allowed (uses token bucket algorithm)
per_tenant
boolean
default:"true"
Apply rate limits per tenant (vs. globally)

Default Rate Limits

By Endpoint Type

Endpoint TypeRate LimitBurst
Read operations (GET)1000 req/min1500
Write operations (POST, PUT)500 req/min750
Delete operations200 req/min300
Batch operations100 req/min150
Aggregate queries100 req/min150
Version operations500 req/min750

By Tenant Tier

If you implement tiered plans:
TierRate LimitBurst
Free100 req/min150
Starter500 req/min750
Professional2000 req/min3000
Enterprise10000 req/min15000

Rate Limit Exceeded

When rate limit is exceeded, the API returns: Status Code: 429 Too Many Requests
curl -X GET "http://localhost:8080/users" \
  -H "Authorization: Bearer YOUR_TOKEN"
{
  "error": {
    "code": "RATE_LIMIT_EXCEEDED",
    "message": "rate limit exceeded, try again later"
  },
  "meta": {
    "request_id": "f8a9b0c1d2e3"
  }
}

Response Headers on Rate Limit

HTTP/1.1 429 Too Many Requests
Content-Type: application/json; charset=utf-8
X-Request-ID: f8a9b0c1d2e3
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1735689660
Retry-After: 60
Retry-After
string
Number of seconds to wait before retrying

Rate Limiting Algorithms

Token Bucket Algorithm

Permission Mongo uses the token bucket algorithm for rate limiting:
  1. Bucket Capacity: Set by burst_size
  2. Refill Rate: Set by requests_per_second
  3. Token Cost: Each request consumes 1 token
  4. Allow Bursts: Can make burst requests up to bucket capacity

How It Works

Initial state: 200 tokens (burst_size = 200)
Refill rate: 100 tokens/second

Request 1: 200 tokens remaining → allowed
Request 2: 199 tokens remaining → allowed
...
Request 201: 0 tokens remaining → rate limited

Wait 1 second: 100 tokens added
Request 202: 99 tokens remaining → allowed

Per-Tenant Isolation

When per_tenant: true, each tenant has independent rate limits:
  • Tenant A: 1000 req/min
  • Tenant B: 1000 req/min
  • Total system: 2000 req/min
This prevents one tenant from exhausting the API for others.

Distributed Rate Limiting

The built-in rate limiter is per-instance. For multi-instance deployments, use Redis-based distributed rate limiting.
For production with multiple API instances:

Redis-Based Rate Limiting

Configure Redis for distributed rate limiting:
server:
  rate_limit:
    enabled: true
    backend: redis
    requests_per_second: 100
    burst_size: 200
    redis:
      url: redis://localhost:6379
      key_prefix: "ratelimit:"
      ttl: 60s
Benefits:
  • Consistent limits across all instances
  • Shared state in Redis
  • Atomic operations for accuracy

Best Practices

Client-Side Best Practices

  1. Check Headers: Monitor X-RateLimit-Remaining
  2. Implement Backoff: Use exponential backoff on 429 errors
  3. Respect Retry-After: Wait the specified time before retrying
  4. Batch Requests: Use batch endpoints to reduce request count
  5. Cache Responses: Cache GET responses to reduce API calls

Example: Exponential Backoff

async function fetchWithRetry(
  url: string,
  options: RequestInit,
  maxRetries = 3
) {
  let retries = 0;
  let delay = 1000; // Start with 1 second

  while (retries < maxRetries) {
    try {
      const response = await fetch(url, options);

      if (response.status === 429) {
        // Rate limited
        const retryAfter = response.headers.get('Retry-After');
        const waitTime = retryAfter 
          ? parseInt(retryAfter) * 1000 
          : delay;

        console.log(`Rate limited. Waiting ${waitTime}ms...`);
        await sleep(waitTime);

        retries++;
        delay *= 2; // Exponential backoff
        continue;
      }

      return response;
    } catch (error) {
      if (retries === maxRetries - 1) throw error;
      
      await sleep(delay);
      retries++;
      delay *= 2;
    }
  }

  throw new Error('Max retries exceeded');
}

function sleep(ms: number) {
  return new Promise(resolve => setTimeout(resolve, ms));
}

Example: Request Throttling

import time
from typing import Optional

class RateLimitedClient:
    def __init__(self, base_url: str, token: str):
        self.base_url = base_url
        self.token = token
        self.remaining = None
        self.reset_time = None
    
    def _update_limits(self, response):
        """Update rate limit state from response headers"""
        self.remaining = int(response.headers.get('X-RateLimit-Remaining', 0))
        self.reset_time = int(response.headers.get('X-RateLimit-Reset', 0))
    
    def _should_wait(self) -> Optional[float]:
        """Check if we should wait before making request"""
        if self.remaining is not None and self.remaining < 10:
            # Less than 10 requests remaining, wait until reset
            if self.reset_time:
                wait_time = self.reset_time - time.time()
                if wait_time > 0:
                    return wait_time
        return None
    
    def request(self, method: str, path: str, **kwargs):
        # Check if we should wait
        wait_time = self._should_wait()
        if wait_time:
            print(f"Rate limit low, waiting {wait_time:.1f}s...")
            time.sleep(wait_time)
        
        # Make request
        response = requests.request(
            method,
            f"{self.base_url}{path}",
            headers={'Authorization': f'Bearer {self.token}'},
            **kwargs
        )
        
        # Update limits
        self._update_limits(response)
        
        return response

Exempt Endpoints

These endpoints are exempt from rate limiting:
  • /health - Health check
  • /ready - Readiness check
  • /metrics - Prometheus metrics

Quotas vs Rate Limits

Rate Limits (Time-Based)

  • What: Requests per time window (per second/minute)
  • When: Short-term throttling
  • Reset: Automatic after time window

Quotas (Volume-Based)

  • What: Total requests per billing period (per month)
  • When: Long-term usage limits
  • Reset: At billing cycle
Quotas are not currently implemented in Permission Mongo but can be added based on your billing model.

Monitoring Rate Limits

Prometheus Metrics

Rate limit metrics are exposed at /metrics:
# Total rate limited requests
http_rate_limited_total{tenant="abc"} 42

# Rate limit remaining by tenant
http_rate_limit_remaining{tenant="abc"} 958

# Rate limit utilization (0-1)
http_rate_limit_utilization{tenant="abc"} 0.042

Grafana Dashboard

Create alerts for:
  • High rate limit utilization (>80%)
  • Frequent 429 errors
  • Abnormal traffic patterns

Increasing Rate Limits

For Development

Increase limits in config.yaml:
server:
  rate_limit:
    enabled: false  # Disable for local development

For Production

Contact your administrator to:
  1. Upgrade tenant tier
  2. Request limit increase
  3. Implement caching strategies
  4. Optimize API usage

Troubleshooting

Issue: Getting 429 Too Quickly

Solutions:
  1. Check X-RateLimit-Remaining before making requests
  2. Implement request queuing
  3. Use batch endpoints to reduce request count
  4. Cache responses when possible

Issue: Inconsistent Limits Across Instances

Solution: Switch to Redis-based distributed rate limiting

Issue: Rate Limit Not Resetting

Check:
  1. Server time synchronization (NTP)
  2. Redis connectivity (if using distributed limiting)
  3. Rate limit configuration

FAQ

Q: Are rate limits per user or per tenant? A: By default, per tenant. All users in a tenant share the same rate limit. Q: Can I increase my rate limit? A: Yes, upgrade your tenant tier or request an increase from your administrator. Q: Do failed requests count against the limit? A: Yes, all requests (successful or not) count against the rate limit, except for 429 responses from the rate limiter itself. Q: How do I test rate limiting locally? A: Set very low limits in config.yaml or use a load testing tool like wrk or k6. Q: Are websockets rate limited? A: Permission Mongo currently only supports HTTP REST API. Websockets are not implemented.

Build docs developers (and LLMs) love