Rate Limits - Permission Mongo

Overview

Permission Mongo implements rate limiting to protect the API from abuse and ensure fair resource allocation across all tenants. Rate limits are enforced per tenant and per endpoint.

Rate limiting is currently implemented at the application level. For production deployments with multiple instances, consider using a distributed rate limiter with Redis.

Rate Limit Headers

The API includes rate limit information in response headers:

X-RateLimit-Limit

string

Maximum number of requests allowed in the current time window

X-RateLimit-Remaining

string

Number of requests remaining in the current time window

X-RateLimit-Reset

string

Unix timestamp (seconds) when the rate limit resets

Example Response Headers

curl -v "http://localhost:8080/users" \
  -H "Authorization: Bearer YOUR_TOKEN"

HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
X-Request-ID: a1b2c3d4e5f6
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 999
X-RateLimit-Reset: 1735689600

Rate Limit Configuration

Configure rate limits in your config.yaml:

server:
  rate_limit:
    enabled: true
    requests_per_second: 100
    burst_size: 200
    per_tenant: true

enabled

boolean

default:"true"

Enable or disable rate limiting

requests_per_second

integer

default:"100"

Average requests allowed per second

burst_size

integer

default:"200"

Maximum burst of requests allowed (uses token bucket algorithm)

per_tenant

boolean

default:"true"

Apply rate limits per tenant (vs. globally)

Default Rate Limits

By Endpoint Type

Endpoint Type	Rate Limit	Burst
Read operations (GET)	1000 req/min	1500
Write operations (POST, PUT)	500 req/min	750
Delete operations	200 req/min	300
Batch operations	100 req/min	150
Aggregate queries	100 req/min	150
Version operations	500 req/min	750

By Tenant Tier

If you implement tiered plans:

Tier	Rate Limit	Burst
Free	100 req/min	150
Starter	500 req/min	750
Professional	2000 req/min	3000
Enterprise	10000 req/min	15000

Rate Limit Exceeded

When rate limit is exceeded, the API returns: Status Code: 429 Too Many Requests

curl -X GET "http://localhost:8080/users" \
  -H "Authorization: Bearer YOUR_TOKEN"

{
  "error": {
    "code": "RATE_LIMIT_EXCEEDED",
    "message": "rate limit exceeded, try again later"
  },
  "meta": {
    "request_id": "f8a9b0c1d2e3"
  }
}

Response Headers on Rate Limit

HTTP/1.1 429 Too Many Requests
Content-Type: application/json; charset=utf-8
X-Request-ID: f8a9b0c1d2e3
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1735689660
Retry-After: 60

Retry-After

string

Number of seconds to wait before retrying

Rate Limiting Algorithms

Token Bucket Algorithm

Permission Mongo uses the token bucket algorithm for rate limiting:

Bucket Capacity: Set by burst_size
Refill Rate: Set by requests_per_second
Token Cost: Each request consumes 1 token
Allow Bursts: Can make burst requests up to bucket capacity

How It Works

Initial state: 200 tokens (burst_size = 200)
Refill rate: 100 tokens/second

Request 1: 200 tokens remaining → allowed
Request 2: 199 tokens remaining → allowed
...
Request 201: 0 tokens remaining → rate limited

Wait 1 second: 100 tokens added
Request 202: 99 tokens remaining → allowed

Per-Tenant Isolation

When per_tenant: true, each tenant has independent rate limits:

Tenant A: 1000 req/min
Tenant B: 1000 req/min
Total system: 2000 req/min

This prevents one tenant from exhausting the API for others.

Distributed Rate Limiting

The built-in rate limiter is per-instance. For multi-instance deployments, use Redis-based distributed rate limiting.

For production with multiple API instances:

Redis-Based Rate Limiting

Configure Redis for distributed rate limiting:

server:
  rate_limit:
    enabled: true
    backend: redis
    requests_per_second: 100
    burst_size: 200
    redis:
      url: redis://localhost:6379
      key_prefix: "ratelimit:"
      ttl: 60s

Benefits:

Consistent limits across all instances
Shared state in Redis
Atomic operations for accuracy

Best Practices

Client-Side Best Practices

Check Headers: Monitor X-RateLimit-Remaining
Implement Backoff: Use exponential backoff on 429 errors
Respect Retry-After: Wait the specified time before retrying
Batch Requests: Use batch endpoints to reduce request count
Cache Responses: Cache GET responses to reduce API calls

Example: Exponential Backoff

async function fetchWithRetry(
  url: string,
  options: RequestInit,
  maxRetries = 3
) {
  let retries = 0;
  let delay = 1000; // Start with 1 second

  while (retries < maxRetries) {
    try {
      const response = await fetch(url, options);

      if (response.status === 429) {
        // Rate limited
        const retryAfter = response.headers.get('Retry-After');
        const waitTime = retryAfter 
          ? parseInt(retryAfter) * 1000 
          : delay;

        console.log(`Rate limited. Waiting ${waitTime}ms...`);
        await sleep(waitTime);

        retries++;
        delay *= 2; // Exponential backoff
        continue;
      }

      return response;
    } catch (error) {
      if (retries === maxRetries - 1) throw error;
      
      await sleep(delay);
      retries++;
      delay *= 2;
    }
  }

  throw new Error('Max retries exceeded');
}

function sleep(ms: number) {
  return new Promise(resolve => setTimeout(resolve, ms));
}

Example: Request Throttling

import time
from typing import Optional

class RateLimitedClient:
    def __init__(self, base_url: str, token: str):
        self.base_url = base_url
        self.token = token
        self.remaining = None
        self.reset_time = None
    
    def _update_limits(self, response):
        """Update rate limit state from response headers"""
        self.remaining = int(response.headers.get('X-RateLimit-Remaining', 0))
        self.reset_time = int(response.headers.get('X-RateLimit-Reset', 0))
    
    def _should_wait(self) -> Optional[float]:
        """Check if we should wait before making request"""
        if self.remaining is not None and self.remaining < 10:
            # Less than 10 requests remaining, wait until reset
            if self.reset_time:
                wait_time = self.reset_time - time.time()
                if wait_time > 0:
                    return wait_time
        return None
    
    def request(self, method: str, path: str, **kwargs):
        # Check if we should wait
        wait_time = self._should_wait()
        if wait_time:
            print(f"Rate limit low, waiting {wait_time:.1f}s...")
            time.sleep(wait_time)
        
        # Make request
        response = requests.request(
            method,
            f"{self.base_url}{path}",
            headers={'Authorization': f'Bearer {self.token}'},
            **kwargs
        )
        
        # Update limits
        self._update_limits(response)
        
        return response

Exempt Endpoints

These endpoints are exempt from rate limiting:

/health - Health check
/ready - Readiness check
/metrics - Prometheus metrics

Quotas vs Rate Limits

Rate Limits (Time-Based)

What: Requests per time window (per second/minute)
When: Short-term throttling
Reset: Automatic after time window

Quotas (Volume-Based)

What: Total requests per billing period (per month)
When: Long-term usage limits
Reset: At billing cycle

Quotas are not currently implemented in Permission Mongo but can be added based on your billing model.

Monitoring Rate Limits

Prometheus Metrics

Rate limit metrics are exposed at /metrics:

# Total rate limited requests
http_rate_limited_total{tenant="abc"} 42

# Rate limit remaining by tenant
http_rate_limit_remaining{tenant="abc"} 958

# Rate limit utilization (0-1)
http_rate_limit_utilization{tenant="abc"} 0.042

Grafana Dashboard

Create alerts for:

High rate limit utilization (>80%)
Frequent 429 errors
Abnormal traffic patterns

Increasing Rate Limits

For Development

Increase limits in config.yaml:

server:
  rate_limit:
    enabled: false  # Disable for local development

For Production

Contact your administrator to:

Upgrade tenant tier
Request limit increase
Implement caching strategies
Optimize API usage

Troubleshooting

Issue: Getting 429 Too Quickly

Solutions:

Check X-RateLimit-Remaining before making requests
Implement request queuing
Use batch endpoints to reduce request count
Cache responses when possible

Issue: Inconsistent Limits Across Instances

Solution: Switch to Redis-based distributed rate limiting

Issue: Rate Limit Not Resetting

Check:

Server time synchronization (NTP)
Redis connectivity (if using distributed limiting)
Rate limit configuration

FAQ

Q: Are rate limits per user or per tenant? A: By default, per tenant. All users in a tenant share the same rate limit. Q: Can I increase my rate limit? A: Yes, upgrade your tenant tier or request an increase from your administrator. Q: Do failed requests count against the limit? A: Yes, all requests (successful or not) count against the rate limit, except for 429 responses from the rate limiter itself. Q: How do I test rate limiting locally? A: Set very low limits in config.yaml or use a load testing tool like wrk or k6. Q: Are websockets rate limited? A: Permission Mongo currently only supports HTTP REST API. Websockets are not implemented.

Overview

Health & Monitoring

Document Operations

Batch Operations

Query Operations

Version History

​Overview

​Rate Limit Headers

​Example Response Headers

​Rate Limit Configuration

​Default Rate Limits

​By Endpoint Type

​By Tenant Tier

​Rate Limit Exceeded

​Response Headers on Rate Limit

​Rate Limiting Algorithms

​Token Bucket Algorithm

​How It Works

​Per-Tenant Isolation

​Distributed Rate Limiting

​Redis-Based Rate Limiting

​Best Practices

​Client-Side Best Practices

​Example: Exponential Backoff

​Example: Request Throttling

​Exempt Endpoints

​Quotas vs Rate Limits

​Rate Limits (Time-Based)

​Quotas (Volume-Based)

​Monitoring Rate Limits

​Prometheus Metrics

​Grafana Dashboard

​Increasing Rate Limits

​For Development

​For Production

​Troubleshooting

​Issue: Getting 429 Too Quickly

​Issue: Inconsistent Limits Across Instances

​Issue: Rate Limit Not Resetting

​FAQ

Build docs developers (and LLMs) love

Overview

Rate Limit Headers

Example Response Headers

Rate Limit Configuration

Default Rate Limits

By Endpoint Type

By Tenant Tier

Rate Limit Exceeded

Response Headers on Rate Limit

Rate Limiting Algorithms

Token Bucket Algorithm

How It Works

Per-Tenant Isolation

Distributed Rate Limiting

Redis-Based Rate Limiting

Best Practices

Client-Side Best Practices

Example: Exponential Backoff

Example: Request Throttling

Exempt Endpoints

Quotas vs Rate Limits

Rate Limits (Time-Based)

Quotas (Volume-Based)

Monitoring Rate Limits

Prometheus Metrics

Grafana Dashboard

Increasing Rate Limits

For Development

For Production

Troubleshooting

Issue: Getting 429 Too Quickly

Issue: Inconsistent Limits Across Instances

Issue: Rate Limit Not Resetting

FAQ