Rate Limits

Overview

Scalekit implements rate limiting to ensure fair usage and maintain API availability for all customers. Rate limits protect the API infrastructure from abuse and ensure consistent performance.

Rate Limit Policy

Rate limits are applied per environment and are calculated based on:

API endpoint: Different endpoints may have different rate limits
Authentication credentials: Limits are tracked per client_id
Time window: Limits reset after a specific time period

Rate limits may vary based on your Scalekit plan. Contact support for enterprise rate limit requirements.

Rate Limit Headers

API responses include rate limit information in the response headers:

X-RateLimit-Limit

integer

Maximum number of requests allowed in the current time window

X-RateLimit-Remaining

integer

Number of requests remaining in the current time window

X-RateLimit-Reset

integer

Unix timestamp when the rate limit window resets

Example Response Headers

HTTP/1.1 200 OK
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 999
X-RateLimit-Reset: 1704067200

Rate Limit Exceeded

When you exceed the rate limit, the API returns a 429 Too Many Requests error:

{
  "code": 429,
  "message": "Rate limit exceeded",
  "details": [
    {
      "@type": "type.googleapis.com/scalekit.v1.errdetails.ErrorInfo",
      "error_code": "RATE_LIMIT_EXCEEDED",
      "retry_after": 60
    }
  ]
}

retry_after

integer

Number of seconds to wait before retrying the request

Best Practices

Monitor Rate Limit Headers

Always check rate limit headers in API responses to avoid hitting limits:

Implement Exponential Backoff

When you encounter rate limits, use exponential backoff to retry requests:

async function makeRequestWithBackoff(requestFn, maxRetries = 3) {
  let retries = 0
  
  while (retries < maxRetries) {
    try {
      return await requestFn()
    } catch (error) {
      if (error.status === 429 && retries < maxRetries - 1) {
        const backoffTime = Math.pow(2, retries) * 1000 // Exponential backoff
        await new Promise(resolve => setTimeout(resolve, backoffTime))
        retries++
      } else {
        throw error
      }
    }
  }
}

// Usage
const organizations = await makeRequestWithBackoff(() =>
  scalekit.organization.listOrganization()
)

Batch Requests

When possible, use batch operations or pagination to reduce the number of API calls:

// Instead of making multiple individual requests
for (const orgId of organizationIds) {
  await scalekit.organization.getOrganization(orgId) // Multiple API calls
}

// Use pagination to fetch multiple resources in one request
const organizations = await scalekit.organization.listOrganization({
  pageSize: 100 // Fetch up to 100 organizations per request
})

Cache Responses

Cache API responses when appropriate to reduce redundant requests:

const cache = new Map()
const CACHE_TTL = 300000 // 5 minutes

async function getCachedOrganization(organizationId) {
  const cacheKey = `org_${organizationId}`
  const cached = cache.get(cacheKey)
  
  if (cached && Date.now() - cached.timestamp < CACHE_TTL) {
    return cached.data
  }
  
  const organization = await scalekit.organization.getOrganization(organizationId)
  cache.set(cacheKey, { data: organization, timestamp: Date.now() })
  
  return organization
}

Rate Limit Tiers

Different API endpoints may have different rate limits based on their resource intensity:

Endpoint Category	Typical Limit	Notes
Read operations	Higher limits	GET requests for listing and retrieving resources
Write operations	Moderate limits	POST, PUT, PATCH requests for creating/updating resources
Delete operations	Lower limits	DELETE requests for removing resources
Authentication	Special limits	Token generation and validation endpoints

Contact Scalekit support if you need higher rate limits for your use case.

Handling Rate Limits in Production

Queue-Based Architecture

For high-volume applications, implement a queue-based system:

Add API requests to a queue
Process requests at a controlled rate
Monitor rate limit headers
Adjust processing speed based on remaining quota

Distributed Rate Limiting

If your application runs on multiple servers, coordinate rate limiting across instances:

Use a shared cache (Redis, Memcached) to track API usage
Distribute quota across application instances
Implement circuit breakers to prevent cascading failures

Monitoring and Alerts

Set up monitoring to track rate limit usage:

Log rate limit headers from API responses
Alert when remaining quota drops below a threshold
Track 429 errors in your application logs
Monitor retry patterns and backoff behavior

Next Steps

API Overview

Learn about API structure and versioning

Authentication

Set up API authentication

Overview

Authentication

Organizations

Users

SSO Connections

SCIM

Webhooks

Overview

Rate Limit Policy

Rate Limit Headers

Example Response Headers

Rate Limit Exceeded

Best Practices

Monitor Rate Limit Headers

Implement Exponential Backoff

Batch Requests

Cache Responses

Rate Limit Tiers

Handling Rate Limits in Production

Queue-Based Architecture

Distributed Rate Limiting

Monitoring and Alerts

Next Steps

API Overview

Authentication

Build docs developers (and LLMs) love

Overview

Authentication

Organizations

Users

SSO Connections

SCIM

Webhooks

​Overview

​Rate Limit Policy

​Rate Limit Headers

​Example Response Headers

​Rate Limit Exceeded

​Best Practices

​Monitor Rate Limit Headers

​Implement Exponential Backoff

​Batch Requests

​Cache Responses

​Rate Limit Tiers

​Handling Rate Limits in Production

​Queue-Based Architecture

​Distributed Rate Limiting

​Monitoring and Alerts

​Next Steps

API Overview

Authentication

Build docs developers (and LLMs) love

Overview

Rate Limit Policy

Rate Limit Headers

Example Response Headers

Rate Limit Exceeded

Best Practices

Monitor Rate Limit Headers

Implement Exponential Backoff

Batch Requests

Cache Responses

Rate Limit Tiers

Handling Rate Limits in Production

Queue-Based Architecture

Distributed Rate Limiting

Monitoring and Alerts

Next Steps