Skip to main content

Overview

The rate limit middleware provides adaptive rate limiting using the BBR (Bottleneck Bandwidth and RTT) algorithm. It automatically adjusts the rate limit based on system load to prevent service overload while maximizing throughput.

Installation

go get github.com/go-kratos/kratos/v2/middleware/ratelimit
go get github.com/go-kratos/aegis/ratelimit

Server Middleware

The Server function creates a server-side rate limiting middleware:
func Server(opts ...Option) middleware.Middleware

Basic Usage

import (
    "github.com/go-kratos/kratos/v2"
    "github.com/go-kratos/kratos/v2/middleware/ratelimit"
    "github.com/go-kratos/kratos/v2/transport/http"
    "github.com/go-kratos/kratos/v2/transport/grpc"
)

func main() {
    // Create HTTP server with rate limiting
    httpSrv := http.NewServer(
        http.Address(":8000"),
        http.Middleware(
            ratelimit.Server(),
        ),
    )
    
    // Create gRPC server with rate limiting
    grpcSrv := grpc.NewServer(
        grpc.Address(":9000"),
        grpc.Middleware(
            ratelimit.Server(),
        ),
    )
    
    app := kratos.New(
        kratos.Server(httpSrv, grpcSrv),
    )
    
    if err := app.Run(); err != nil {
        log.Fatal(err)
    }
}
By default, the middleware uses the BBR limiter which automatically adapts to system load.

BBR Algorithm

BBR (Bottleneck Bandwidth and RTT) is an adaptive algorithm that:
  • Monitors system CPU usage and request latency
  • Calculates optimal concurrency limit
  • Adapts to changing load conditions
  • Prevents cascading failures
  • Maximizes throughput while maintaining stability

How BBR Works

  1. Measures system metrics: CPU usage, request count, latency
  2. Calculates max concurrency: Based on historical data
  3. Allows or rejects requests: Based on current load vs. capacity
  4. Adapts continuously: Updates limits as conditions change

Error Response

When rate limit is exceeded, the middleware returns:
var ErrLimitExceed = errors.New(429, "RATELIMIT", "service unavailable due to rate limit exceeded")
This returns:
  • HTTP status: 429 Too Many Requests
  • gRPC code: RESOURCE_EXHAUSTED
  • Reason: "RATELIMIT"
  • Message: "service unavailable due to rate limit exceeded"

Configuration Options

WithLimiter

Use a custom limiter implementation:
import (
    "github.com/go-kratos/aegis/ratelimit"
    "github.com/go-kratos/aegis/ratelimit/bbr"
)

func WithLimiter(limiter ratelimit.Limiter) Option

Custom BBR Configuration

Create a custom BBR limiter with specific settings:
import (
    "github.com/go-kratos/kratos/v2/middleware/ratelimit"
    "github.com/go-kratos/aegis/ratelimit/bbr"
)

// Create custom BBR limiter
limiter := bbr.NewLimiter(
    bbr.WithWindow(10 * time.Second),     // Sliding window size
    bbr.WithBucket(100),                   // Number of buckets
    bbr.WithCPUThreshold(800),             // CPU threshold (800 = 80%)
)

// Use in middleware
ratelimit.Server(
    ratelimit.WithLimiter(limiter),
)

Custom Limiter Implementation

You can implement your own limiter:
import "github.com/go-kratos/aegis/ratelimit"

type Limiter interface {
    Allow() (DoneFunc, error)
}

type DoneFunc func(DoneInfo)

type DoneInfo struct {
    Err error
}

Token Bucket Example

import (
    "sync"
    "time"
    
    "github.com/go-kratos/aegis/ratelimit"
    "github.com/go-kratos/kratos/v2/errors"
)

type tokenBucketLimiter struct {
    rate   int           // Tokens per second
    burst  int           // Maximum burst size
    tokens int           // Current tokens
    last   time.Time     // Last refill time
    mu     sync.Mutex
}

func NewTokenBucketLimiter(rate, burst int) ratelimit.Limiter {
    return &tokenBucketLimiter{
        rate:   rate,
        burst:  burst,
        tokens: burst,
        last:   time.Now(),
    }
}

func (l *tokenBucketLimiter) Allow() (ratelimit.DoneFunc, error) {
    l.mu.Lock()
    defer l.mu.Unlock()
    
    // Refill tokens
    now := time.Now()
    elapsed := now.Sub(l.last).Seconds()
    l.tokens = min(l.burst, l.tokens+int(elapsed*float64(l.rate)))
    l.last = now
    
    // Check if token available
    if l.tokens <= 0 {
        return nil, errors.New(429, "RATELIMIT", "rate limit exceeded")
    }
    
    // Consume token
    l.tokens--
    
    return func(info ratelimit.DoneInfo) {
        // No-op for token bucket
    }, nil
}

func min(a, b int) int {
    if a < b {
        return a
    }
    return b
}

Usage with Custom Limiter

import "github.com/go-kratos/kratos/v2/middleware/ratelimit"

// 100 requests per second with burst of 200
limiter := NewTokenBucketLimiter(100, 200)

ratelimit.Server(
    ratelimit.WithLimiter(limiter),
)

Complete Example

package main

import (
    "context"
    "fmt"
    "log"
    "time"
    
    "github.com/go-kratos/kratos/v2"
    "github.com/go-kratos/kratos/v2/middleware/ratelimit"
    "github.com/go-kratos/kratos/v2/middleware/recovery"
    "github.com/go-kratos/kratos/v2/transport/http"
    
    "github.com/go-kratos/aegis/ratelimit/bbr"
)

func main() {
    // Create custom BBR limiter
    limiter := bbr.NewLimiter(
        bbr.WithWindow(10 * time.Second),
        bbr.WithBucket(100),
        bbr.WithCPUThreshold(800), // 80% CPU threshold
    )
    
    // Create HTTP server with rate limiting
    httpSrv := http.NewServer(
        http.Address(":8000"),
        http.Middleware(
            recovery.Recovery(),
            ratelimit.Server(
                ratelimit.WithLimiter(limiter),
            ),
        ),
    )
    
    app := kratos.New(
        kratos.Name("ratelimit-example"),
        kratos.Server(httpSrv),
    )
    
    if err := app.Run(); err != nil {
        log.Fatal(err)
    }
}

Testing Rate Limiting

# Install Apache Bench
sudo apt-get install apache2-utils

# Test with 1000 requests, 100 concurrent
ab -n 1000 -c 100 http://localhost:8000/api/hello

# Or use hey
go install github.com/rakyll/hey@latest
hey -n 1000 -c 100 http://localhost:8000/api/hello
You should see some requests returning 429 status code when the rate limit is exceeded.

Monitoring Rate Limiting

Track rate limiting metrics:
import (
    "github.com/go-kratos/kratos/v2/middleware"
    "github.com/go-kratos/kratos/v2/middleware/ratelimit"
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/metric"
)

// Custom rate limit middleware with metrics
func RateLimitWithMetrics(limiter ratelimit.Limiter, meter metric.Meter) middleware.Middleware {
    // Create counters
    allowedCounter, _ := meter.Int64Counter("ratelimit_allowed_total")
    rejectedCounter, _ := meter.Int64Counter("ratelimit_rejected_total")
    
    return func(handler middleware.Handler) middleware.Handler {
        return func(ctx context.Context, req any) (any, error) {
            done, err := limiter.Allow()
            if err != nil {
                // Rejected
                rejectedCounter.Add(ctx, 1)
                return nil, ratelimit.ErrLimitExceed
            }
            
            // Allowed
            allowedCounter.Add(ctx, 1)
            reply, err := handler(ctx, req)
            done(ratelimit.DoneInfo{Err: err})
            return reply, err
        }
    }
}

Per-User Rate Limiting

Implement per-user rate limiting using a custom limiter:
import (
    "context"
    "sync"
    
    "github.com/go-kratos/aegis/ratelimit"
    "github.com/go-kratos/aegis/ratelimit/bbr"
    "github.com/go-kratos/kratos/v2/middleware/auth/jwt"
    "github.com/go-kratos/kratos/v2/transport"
)

type perUserLimiter struct {
    limiters sync.Map
}

func NewPerUserLimiter() ratelimit.Limiter {
    return &perUserLimiter{}
}

func (l *perUserLimiter) Allow() (ratelimit.DoneFunc, error) {
    // This is called within request context
    // We need to wrap it to get user info
    return nil, nil
}

// Custom middleware that uses context
func PerUserRateLimit() middleware.Middleware {
    limiters := sync.Map{}
    
    return func(handler middleware.Handler) middleware.Handler {
        return func(ctx context.Context, req any) (any, error) {
            // Get user ID from JWT claims
            claims, ok := jwt.FromContext(ctx)
            if !ok {
                return handler(ctx, req) // No auth, no limit
            }
            
            userID := claims.(jwt.MapClaims)["user_id"].(string)
            
            // Get or create limiter for user
            limiterI, _ := limiters.LoadOrStore(userID, bbr.NewLimiter())
            limiter := limiterI.(ratelimit.Limiter)
            
            // Apply rate limit
            done, err := limiter.Allow()
            if err != nil {
                return nil, ratelimit.ErrLimitExceed
            }
            
            reply, err := handler(ctx, req)
            done(ratelimit.DoneInfo{Err: err})
            return reply, err
        }
    }
}

Selective Rate Limiting

Apply rate limiting to specific routes:
import (
    "github.com/go-kratos/kratos/v2/middleware/ratelimit"
    "github.com/go-kratos/kratos/v2/middleware/selector"
)

// Apply rate limiting only to /api/* routes
http.Middleware(
    selector.Server(
        ratelimit.Server(),
    ).Prefix("/api").Build(),
)

// Exclude health checks from rate limiting
http.Middleware(
    selector.Server(
        ratelimit.Server(),
    ).Match(func(ctx context.Context, operation string) bool {
        return operation != "/healthz" && operation != "/metrics"
    }).Build(),
)

Best Practices

The default BBR limiter is suitable for most use cases. It adapts to system load automatically and prevents overload.
BBR uses CPU usage and request latency as signals. Monitor these metrics to understand rate limiting behavior.
The default CPU threshold is 80%. Adjust based on your service characteristics:
bbr.WithCPUThreshold(700) // 70% for CPU-intensive services
bbr.WithCPUThreshold(900) // 90% for I/O-bound services
Include retry-after headers or suggestions in error responses:
if err == ratelimit.ErrLimitExceed {
    // Add Retry-After header
    tr.ReplyHeader().Set("Retry-After", "60")
    return nil, err
}
For public APIs, implement per-user or per-IP rate limiting to prevent abuse.
Always test rate limiting under realistic load conditions to verify behavior.
Use rate limiting with circuit breaker for comprehensive protection:
http.Middleware(
    recovery.Recovery(),
    ratelimit.Server(),
    circuitbreaker.Client(),
)

BBR vs Token Bucket

FeatureBBRToken Bucket
AdaptiveYes, adapts to system loadNo, fixed rate
CPU-awareYesNo
Latency-awareYesNo
ConfigurationMinimalRequires rate tuning
Use caseBackend servicesPublic APIs with known limits

Client-Side Rate Limiting

While the middleware is designed for servers, you can implement client-side rate limiting:
import (
    "context"
    "time"
    
    "github.com/go-kratos/aegis/ratelimit/bbr"
    "github.com/go-kratos/kratos/v2/middleware"
)

// Client rate limiting middleware
func ClientRateLimit(limiter ratelimit.Limiter) middleware.Middleware {
    return func(handler middleware.Handler) middleware.Handler {
        return func(ctx context.Context, req any) (any, error) {
            done, err := limiter.Allow()
            if err != nil {
                // Wait and retry
                time.Sleep(time.Second)
                return handler(ctx, req)
            }
            
            reply, err := handler(ctx, req)
            done(ratelimit.DoneInfo{Err: err})
            return reply, err
        }
    }
}

// Use in client
limiter := bbr.NewLimiter()
http.WithMiddleware(
    ClientRateLimit(limiter),
)

Troubleshooting

Too Many 429 Errors

  1. Check CPU usage - may be too high
  2. Increase CPU threshold
  3. Scale horizontally (add more instances)
  4. Optimize slow endpoints

Not Rejecting Requests

  1. Verify BBR configuration
  2. Check if CPU threshold is too high
  3. Ensure middleware is properly registered
  4. Test with sufficient load

Source Reference

The rate limit middleware implementation can be found in:
  • middleware/ratelimit/ratelimit.go:32 - Server middleware
  • middleware/ratelimit/ratelimit.go:13 - ErrLimitExceed error
  • middleware/ratelimit/ratelimit.go:21 - WithLimiter option

Next Steps

Circuit Breaker

Add circuit breaker protection

Metrics

Monitor rate limiting effectiveness

Build docs developers (and LLMs) love