Rate Limit Middleware

Overview

The rate limit middleware provides adaptive rate limiting using the BBR (Bottleneck Bandwidth and RTT) algorithm. It automatically adjusts the rate limit based on system load to prevent service overload while maximizing throughput.

Installation

go get github.com/go-kratos/kratos/v2/middleware/ratelimit
go get github.com/go-kratos/aegis/ratelimit

Server Middleware

The Server function creates a server-side rate limiting middleware:

func Server(opts ...Option) middleware.Middleware

Basic Usage

import (
    "github.com/go-kratos/kratos/v2"
    "github.com/go-kratos/kratos/v2/middleware/ratelimit"
    "github.com/go-kratos/kratos/v2/transport/http"
    "github.com/go-kratos/kratos/v2/transport/grpc"
)

func main() {
    // Create HTTP server with rate limiting
    httpSrv := http.NewServer(
        http.Address(":8000"),
        http.Middleware(
            ratelimit.Server(),
        ),
    )
    
    // Create gRPC server with rate limiting
    grpcSrv := grpc.NewServer(
        grpc.Address(":9000"),
        grpc.Middleware(
            ratelimit.Server(),
        ),
    )
    
    app := kratos.New(
        kratos.Server(httpSrv, grpcSrv),
    )
    
    if err := app.Run(); err != nil {
        log.Fatal(err)
    }
}

By default, the middleware uses the BBR limiter which automatically adapts to system load.

BBR Algorithm

BBR (Bottleneck Bandwidth and RTT) is an adaptive algorithm that:

Monitors system CPU usage and request latency
Calculates optimal concurrency limit
Adapts to changing load conditions
Prevents cascading failures
Maximizes throughput while maintaining stability

How BBR Works

Measures system metrics: CPU usage, request count, latency
Calculates max concurrency: Based on historical data
Allows or rejects requests: Based on current load vs. capacity
Adapts continuously: Updates limits as conditions change

Error Response

When rate limit is exceeded, the middleware returns:

var ErrLimitExceed = errors.New(429, "RATELIMIT", "service unavailable due to rate limit exceeded")

This returns:

HTTP status: 429 Too Many Requests
gRPC code: RESOURCE_EXHAUSTED
Reason: "RATELIMIT"
Message: "service unavailable due to rate limit exceeded"

Configuration Options

WithLimiter

Use a custom limiter implementation:

import (
    "github.com/go-kratos/aegis/ratelimit"
    "github.com/go-kratos/aegis/ratelimit/bbr"
)

func WithLimiter(limiter ratelimit.Limiter) Option

Custom BBR Configuration

Create a custom BBR limiter with specific settings:

import (
    "github.com/go-kratos/kratos/v2/middleware/ratelimit"
    "github.com/go-kratos/aegis/ratelimit/bbr"
)

// Create custom BBR limiter
limiter := bbr.NewLimiter(
    bbr.WithWindow(10 * time.Second),     // Sliding window size
    bbr.WithBucket(100),                   // Number of buckets
    bbr.WithCPUThreshold(800),             // CPU threshold (800 = 80%)
)

// Use in middleware
ratelimit.Server(
    ratelimit.WithLimiter(limiter),
)

Custom Limiter Implementation

You can implement your own limiter:

import "github.com/go-kratos/aegis/ratelimit"

type Limiter interface {
    Allow() (DoneFunc, error)
}

type DoneFunc func(DoneInfo)

type DoneInfo struct {
    Err error
}

Token Bucket Example

import (
    "sync"
    "time"
    
    "github.com/go-kratos/aegis/ratelimit"
    "github.com/go-kratos/kratos/v2/errors"
)

type tokenBucketLimiter struct {
    rate   int           // Tokens per second
    burst  int           // Maximum burst size
    tokens int           // Current tokens
    last   time.Time     // Last refill time
    mu     sync.Mutex
}

func NewTokenBucketLimiter(rate, burst int) ratelimit.Limiter {
    return &tokenBucketLimiter{
        rate:   rate,
        burst:  burst,
        tokens: burst,
        last:   time.Now(),
    }
}

func (l *tokenBucketLimiter) Allow() (ratelimit.DoneFunc, error) {
    l.mu.Lock()
    defer l.mu.Unlock()
    
    // Refill tokens
    now := time.Now()
    elapsed := now.Sub(l.last).Seconds()
    l.tokens = min(l.burst, l.tokens+int(elapsed*float64(l.rate)))
    l.last = now
    
    // Check if token available
    if l.tokens <= 0 {
        return nil, errors.New(429, "RATELIMIT", "rate limit exceeded")
    }
    
    // Consume token
    l.tokens--
    
    return func(info ratelimit.DoneInfo) {
        // No-op for token bucket
    }, nil
}

func min(a, b int) int {
    if a < b {
        return a
    }
    return b
}

Usage with Custom Limiter

import "github.com/go-kratos/kratos/v2/middleware/ratelimit"

// 100 requests per second with burst of 200
limiter := NewTokenBucketLimiter(100, 200)

ratelimit.Server(
    ratelimit.WithLimiter(limiter),
)

Complete Example

package main

import (
    "context"
    "fmt"
    "log"
    "time"
    
    "github.com/go-kratos/kratos/v2"
    "github.com/go-kratos/kratos/v2/middleware/ratelimit"
    "github.com/go-kratos/kratos/v2/middleware/recovery"
    "github.com/go-kratos/kratos/v2/transport/http"
    
    "github.com/go-kratos/aegis/ratelimit/bbr"
)

func main() {
    // Create custom BBR limiter
    limiter := bbr.NewLimiter(
        bbr.WithWindow(10 * time.Second),
        bbr.WithBucket(100),
        bbr.WithCPUThreshold(800), // 80% CPU threshold
    )
    
    // Create HTTP server with rate limiting
    httpSrv := http.NewServer(
        http.Address(":8000"),
        http.Middleware(
            recovery.Recovery(),
            ratelimit.Server(
                ratelimit.WithLimiter(limiter),
            ),
        ),
    )
    
    app := kratos.New(
        kratos.Name("ratelimit-example"),
        kratos.Server(httpSrv),
    )
    
    if err := app.Run(); err != nil {
        log.Fatal(err)
    }
}

Testing Rate Limiting

# Install Apache Bench
sudo apt-get install apache2-utils

# Test with 1000 requests, 100 concurrent
ab -n 1000 -c 100 http://localhost:8000/api/hello

# Or use hey
go install github.com/rakyll/hey@latest
hey -n 1000 -c 100 http://localhost:8000/api/hello

You should see some requests returning 429 status code when the rate limit is exceeded.

Monitoring Rate Limiting

Track rate limiting metrics:

import (
    "github.com/go-kratos/kratos/v2/middleware"
    "github.com/go-kratos/kratos/v2/middleware/ratelimit"
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/metric"
)

// Custom rate limit middleware with metrics
func RateLimitWithMetrics(limiter ratelimit.Limiter, meter metric.Meter) middleware.Middleware {
    // Create counters
    allowedCounter, _ := meter.Int64Counter("ratelimit_allowed_total")
    rejectedCounter, _ := meter.Int64Counter("ratelimit_rejected_total")
    
    return func(handler middleware.Handler) middleware.Handler {
        return func(ctx context.Context, req any) (any, error) {
            done, err := limiter.Allow()
            if err != nil {
                // Rejected
                rejectedCounter.Add(ctx, 1)
                return nil, ratelimit.ErrLimitExceed
            }
            
            // Allowed
            allowedCounter.Add(ctx, 1)
            reply, err := handler(ctx, req)
            done(ratelimit.DoneInfo{Err: err})
            return reply, err
        }
    }
}

Per-User Rate Limiting

Implement per-user rate limiting using a custom limiter:

import (
    "context"
    "sync"
    
    "github.com/go-kratos/aegis/ratelimit"
    "github.com/go-kratos/aegis/ratelimit/bbr"
    "github.com/go-kratos/kratos/v2/middleware/auth/jwt"
    "github.com/go-kratos/kratos/v2/transport"
)

type perUserLimiter struct {
    limiters sync.Map
}

func NewPerUserLimiter() ratelimit.Limiter {
    return &perUserLimiter{}
}

func (l *perUserLimiter) Allow() (ratelimit.DoneFunc, error) {
    // This is called within request context
    // We need to wrap it to get user info
    return nil, nil
}

// Custom middleware that uses context
func PerUserRateLimit() middleware.Middleware {
    limiters := sync.Map{}
    
    return func(handler middleware.Handler) middleware.Handler {
        return func(ctx context.Context, req any) (any, error) {
            // Get user ID from JWT claims
            claims, ok := jwt.FromContext(ctx)
            if !ok {
                return handler(ctx, req) // No auth, no limit
            }
            
            userID := claims.(jwt.MapClaims)["user_id"].(string)
            
            // Get or create limiter for user
            limiterI, _ := limiters.LoadOrStore(userID, bbr.NewLimiter())
            limiter := limiterI.(ratelimit.Limiter)
            
            // Apply rate limit
            done, err := limiter.Allow()
            if err != nil {
                return nil, ratelimit.ErrLimitExceed
            }
            
            reply, err := handler(ctx, req)
            done(ratelimit.DoneInfo{Err: err})
            return reply, err
        }
    }
}

Selective Rate Limiting

Apply rate limiting to specific routes:

import (
    "github.com/go-kratos/kratos/v2/middleware/ratelimit"
    "github.com/go-kratos/kratos/v2/middleware/selector"
)

// Apply rate limiting only to /api/* routes
http.Middleware(
    selector.Server(
        ratelimit.Server(),
    ).Prefix("/api").Build(),
)

// Exclude health checks from rate limiting
http.Middleware(
    selector.Server(
        ratelimit.Server(),
    ).Match(func(ctx context.Context, operation string) bool {
        return operation != "/healthz" && operation != "/metrics"
    }).Build(),
)

Best Practices

Use BBR for Adaptive Limiting

The default BBR limiter is suitable for most use cases. It adapts to system load automatically and prevents overload.

Monitor CPU and Latency

BBR uses CPU usage and request latency as signals. Monitor these metrics to understand rate limiting behavior.

Set Appropriate Thresholds

The default CPU threshold is 80%. Adjust based on your service characteristics:

bbr.WithCPUThreshold(700) // 70% for CPU-intensive services
bbr.WithCPUThreshold(900) // 90% for I/O-bound services

Return Helpful Error Messages

Include retry-after headers or suggestions in error responses:

if err == ratelimit.ErrLimitExceed {
    // Add Retry-After header
    tr.ReplyHeader().Set("Retry-After", "60")
    return nil, err
}

Consider Per-User Limits

For public APIs, implement per-user or per-IP rate limiting to prevent abuse.

Test Under Load

Always test rate limiting under realistic load conditions to verify behavior.

Combine with Circuit Breaker

Use rate limiting with circuit breaker for comprehensive protection:

http.Middleware(
    recovery.Recovery(),
    ratelimit.Server(),
    circuitbreaker.Client(),
)

BBR vs Token Bucket

Feature	BBR	Token Bucket
Adaptive	Yes, adapts to system load	No, fixed rate
CPU-aware	Yes	No
Latency-aware	Yes	No
Configuration	Minimal	Requires rate tuning
Use case	Backend services	Public APIs with known limits

Client-Side Rate Limiting

While the middleware is designed for servers, you can implement client-side rate limiting:

import (
    "context"
    "time"
    
    "github.com/go-kratos/aegis/ratelimit/bbr"
    "github.com/go-kratos/kratos/v2/middleware"
)

// Client rate limiting middleware
func ClientRateLimit(limiter ratelimit.Limiter) middleware.Middleware {
    return func(handler middleware.Handler) middleware.Handler {
        return func(ctx context.Context, req any) (any, error) {
            done, err := limiter.Allow()
            if err != nil {
                // Wait and retry
                time.Sleep(time.Second)
                return handler(ctx, req)
            }
            
            reply, err := handler(ctx, req)
            done(ratelimit.DoneInfo{Err: err})
            return reply, err
        }
    }
}

// Use in client
limiter := bbr.NewLimiter()
http.WithMiddleware(
    ClientRateLimit(limiter),
)

Troubleshooting

Too Many 429 Errors

Check CPU usage - may be too high
Increase CPU threshold
Scale horizontally (add more instances)
Optimize slow endpoints

Not Rejecting Requests

Verify BBR configuration
Check if CPU threshold is too high
Ensure middleware is properly registered
Test with sufficient load

Source Reference

The rate limit middleware implementation can be found in:

middleware/ratelimit/ratelimit.go:32 - Server middleware
middleware/ratelimit/ratelimit.go:13 - ErrLimitExceed error
middleware/ratelimit/ratelimit.go:21 - WithLimiter option

Next Steps

Circuit Breaker

Add circuit breaker protection

Metrics

Monitor rate limiting effectiveness

Get Started

Core Concepts

Components

Middleware

Guides

CLI Reference

Rate Limit Middleware

Overview

Installation

Server Middleware

Basic Usage

BBR Algorithm

How BBR Works

Error Response

Configuration Options

WithLimiter

Custom BBR Configuration

Custom Limiter Implementation

Token Bucket Example

Usage with Custom Limiter

Complete Example

Testing Rate Limiting

Monitoring Rate Limiting

Per-User Rate Limiting

Selective Rate Limiting

Best Practices

BBR vs Token Bucket

Client-Side Rate Limiting

Troubleshooting

Too Many 429 Errors

Not Rejecting Requests

Source Reference

Next Steps

Circuit Breaker

Metrics

Build docs developers (and LLMs) love

Get Started

Core Concepts

Components

Middleware

Guides

CLI Reference

​Overview

​Installation

​Server Middleware

​Basic Usage

​BBR Algorithm

​How BBR Works

​Error Response

​Configuration Options

​WithLimiter

​Custom BBR Configuration

​Custom Limiter Implementation

​Token Bucket Example

​Usage with Custom Limiter

​Complete Example

​Testing Rate Limiting

​Monitoring Rate Limiting

​Per-User Rate Limiting

​Selective Rate Limiting

​Best Practices

​BBR vs Token Bucket

​Client-Side Rate Limiting

​Troubleshooting

​Too Many 429 Errors

​Not Rejecting Requests

​Source Reference

​Next Steps

Circuit Breaker

Metrics

Build docs developers (and LLMs) love

Overview

Installation

Server Middleware

Basic Usage

BBR Algorithm

How BBR Works

Error Response

Configuration Options

WithLimiter

Custom BBR Configuration

Custom Limiter Implementation

Token Bucket Example

Usage with Custom Limiter

Complete Example

Testing Rate Limiting

Monitoring Rate Limiting

Per-User Rate Limiting

Selective Rate Limiting

Best Practices

BBR vs Token Bucket

Client-Side Rate Limiting

Troubleshooting

Too Many 429 Errors

Not Rejecting Requests

Source Reference

Next Steps