Skip to main content

Overview

Scalekit implements rate limiting to ensure fair usage and maintain API availability for all customers. Rate limits protect the API infrastructure from abuse and ensure consistent performance.

Rate Limit Policy

Rate limits are applied per environment and are calculated based on:
  • API endpoint: Different endpoints may have different rate limits
  • Authentication credentials: Limits are tracked per client_id
  • Time window: Limits reset after a specific time period
Rate limits may vary based on your Scalekit plan. Contact support for enterprise rate limit requirements.

Rate Limit Headers

API responses include rate limit information in the response headers:
X-RateLimit-Limit
integer
Maximum number of requests allowed in the current time window
X-RateLimit-Remaining
integer
Number of requests remaining in the current time window
X-RateLimit-Reset
integer
Unix timestamp when the rate limit window resets

Example Response Headers

HTTP/1.1 200 OK
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 999
X-RateLimit-Reset: 1704067200

Rate Limit Exceeded

When you exceed the rate limit, the API returns a 429 Too Many Requests error:
{
  "code": 429,
  "message": "Rate limit exceeded",
  "details": [
    {
      "@type": "type.googleapis.com/scalekit.v1.errdetails.ErrorInfo",
      "error_code": "RATE_LIMIT_EXCEEDED",
      "retry_after": 60
    }
  ]
}
retry_after
integer
Number of seconds to wait before retrying the request

Best Practices

Monitor Rate Limit Headers

Always check rate limit headers in API responses to avoid hitting limits:

    Implement Exponential Backoff

    When you encounter rate limits, use exponential backoff to retry requests:
    async function makeRequestWithBackoff(requestFn, maxRetries = 3) {
      let retries = 0
      
      while (retries < maxRetries) {
        try {
          return await requestFn()
        } catch (error) {
          if (error.status === 429 && retries < maxRetries - 1) {
            const backoffTime = Math.pow(2, retries) * 1000 // Exponential backoff
            await new Promise(resolve => setTimeout(resolve, backoffTime))
            retries++
          } else {
            throw error
          }
        }
      }
    }
    
    // Usage
    const organizations = await makeRequestWithBackoff(() =>
      scalekit.organization.listOrganization()
    )
    

    Batch Requests

    When possible, use batch operations or pagination to reduce the number of API calls:
    // Instead of making multiple individual requests
    for (const orgId of organizationIds) {
      await scalekit.organization.getOrganization(orgId) // Multiple API calls
    }
    
    // Use pagination to fetch multiple resources in one request
    const organizations = await scalekit.organization.listOrganization({
      pageSize: 100 // Fetch up to 100 organizations per request
    })
    

    Cache Responses

    Cache API responses when appropriate to reduce redundant requests:
    const cache = new Map()
    const CACHE_TTL = 300000 // 5 minutes
    
    async function getCachedOrganization(organizationId) {
      const cacheKey = `org_${organizationId}`
      const cached = cache.get(cacheKey)
      
      if (cached && Date.now() - cached.timestamp < CACHE_TTL) {
        return cached.data
      }
      
      const organization = await scalekit.organization.getOrganization(organizationId)
      cache.set(cacheKey, { data: organization, timestamp: Date.now() })
      
      return organization
    }
    

    Rate Limit Tiers

    Different API endpoints may have different rate limits based on their resource intensity:
    Endpoint CategoryTypical LimitNotes
    Read operationsHigher limitsGET requests for listing and retrieving resources
    Write operationsModerate limitsPOST, PUT, PATCH requests for creating/updating resources
    Delete operationsLower limitsDELETE requests for removing resources
    AuthenticationSpecial limitsToken generation and validation endpoints
    Contact Scalekit support if you need higher rate limits for your use case.

    Handling Rate Limits in Production

    Queue-Based Architecture

    For high-volume applications, implement a queue-based system:
    1. Add API requests to a queue
    2. Process requests at a controlled rate
    3. Monitor rate limit headers
    4. Adjust processing speed based on remaining quota

    Distributed Rate Limiting

    If your application runs on multiple servers, coordinate rate limiting across instances:
    • Use a shared cache (Redis, Memcached) to track API usage
    • Distribute quota across application instances
    • Implement circuit breakers to prevent cascading failures

    Monitoring and Alerts

    Set up monitoring to track rate limit usage:
    • Log rate limit headers from API responses
    • Alert when remaining quota drops below a threshold
    • Track 429 errors in your application logs
    • Monitor retry patterns and backoff behavior

    Next Steps

    API Overview

    Learn about API structure and versioning

    Authentication

    Set up API authentication

    Build docs developers (and LLMs) love