Skip to main content

Overview

The circuit breaker middleware implements the circuit breaker pattern to prevent cascading failures when calling external services. It monitors error rates and automatically stops sending requests to failing services, giving them time to recover.

Installation

go get github.com/go-kratos/kratos/v2/middleware/circuitbreaker
go get github.com/go-kratos/aegis/circuitbreaker

Client Middleware

The Client function creates a client-side circuit breaker middleware:
func Client(opts ...Option) middleware.Middleware
Circuit breaker is designed for client-side use only. It protects your service from failing dependencies.

Basic Usage

import (
    "github.com/go-kratos/kratos/v2/middleware/circuitbreaker"
    "github.com/go-kratos/kratos/v2/transport/http"
    "github.com/go-kratos/kratos/v2/transport/grpc"
)

// HTTP client with circuit breaker
conn, err := http.NewClient(
    context.Background(),
    http.WithEndpoint("127.0.0.1:8000"),
    http.WithMiddleware(
        circuitbreaker.Client(),
    ),
)

// gRPC client with circuit breaker
conn, err := grpc.DialInsecure(
    context.Background(),
    grpc.WithEndpoint("127.0.0.1:9000"),
    grpc.WithMiddleware(
        circuitbreaker.Client(),
    ),
)

How It Works

The circuit breaker has three states:
                    Success rate improves
                   ┌──────────────────────────┐
                   │                          │
                   │                          v
┌──────────┐   Errors    ┌──────────┐   Test    ┌──────────┐
│   CLOSED   │──────────>│   OPEN     │─────────>│ HALF-OPEN│
│  (Normal)  │   exceed   │ (Blocked) │ request │ (Testing) │
└──────────┘  threshold └──────────┘         └──────────┘
      ^                                           │
      │                                           │
      └───────────── Errors continue ───────────┘

States

  1. CLOSED - Normal operation, requests pass through
  2. OPEN - Circuit is open, requests are rejected immediately
  3. HALF-OPEN - Testing if service has recovered, allows limited requests

SRE Algorithm

The default implementation uses Google SRE’s circuit breaker algorithm: Formula: requests / (requests + accepts) < K Where:
  • requests = Total requests in window
  • accepts = Successful requests (non-error)
  • K = Threshold (typically 2.0)
When the ratio exceeds K, the circuit opens.

Error Response

When the circuit is open, requests are rejected with:
var ErrNotAllowed = errors.New(503, "CIRCUITBREAKER", "request failed due to circuit breaker triggered")
This returns:
  • HTTP status: 503 Service Unavailable
  • gRPC code: UNAVAILABLE
  • Reason: "CIRCUITBREAKER"

Configuration Options

WithCircuitBreaker

Use a custom circuit breaker generator function:
import (
    "github.com/go-kratos/aegis/circuitbreaker/sre"
    "github.com/go-kratos/kratos/v2/middleware/circuitbreaker"
)

circuitbreaker.Client(
    circuitbreaker.WithCircuitBreaker(func() circuitbreaker.CircuitBreaker {
        return sre.NewBreaker(
            sre.WithRequest(100),      // Minimum requests before calculating
            sre.WithSuccess(0.5),      // Success ratio (0.5 = 50%)
            sre.WithBucket(10),        // Number of buckets in window
            sre.WithWindow(time.Second * 30), // Time window
        )
    }),
)

WithGroup

Use a shared circuit breaker group:
import (
    "github.com/go-kratos/aegis/circuitbreaker"
    "github.com/go-kratos/kratos/v2/internal/group"
    "github.com/go-kratos/kratos/v2/middleware/circuitbreaker"
)

// Create shared group
group := group.NewGroup(func() circuitbreaker.CircuitBreaker {
    return sre.NewBreaker()
})

// Use in multiple clients
circuitbreaker.Client(
    circuitbreaker.WithGroup(group),
)

Per-Operation Circuit Breakers

By default, circuit breakers are created per operation. Each endpoint gets its own circuit breaker:
// Each operation has its own circuit breaker
GET /api/users       -> circuit breaker 1
GET /api/posts       -> circuit breaker 2
POST /api/users      -> circuit breaker 3
This is implemented in the middleware:
info, _ := transport.FromClientContext(ctx)
breaker := opt.group.Get(info.Operation())

Triggering Conditions

The circuit breaker triggers on:
  • 500 Internal Server Error
  • 503 Service Unavailable
  • 504 Gateway Timeout
if err != nil && (
    errors.IsInternalServer(err) ||
    errors.IsServiceUnavailable(err) ||
    errors.IsGatewayTimeout(err)
) {
    breaker.MarkFailed()
} else {
    breaker.MarkSuccess()
}
Other errors (400, 401, 404, etc.) are not considered circuit breaker failures.

Complete Example

package main

import (
    "context"
    "log"
    "time"
    
    "github.com/go-kratos/aegis/circuitbreaker/sre"
    "github.com/go-kratos/kratos/v2/middleware/circuitbreaker"
    "github.com/go-kratos/kratos/v2/middleware/recovery"
    "github.com/go-kratos/kratos/v2/transport/http"
)

func main() {
    // Create custom circuit breaker
    cb := circuitbreaker.WithCircuitBreaker(func() circuitbreaker.CircuitBreaker {
        return sre.NewBreaker(
            sre.WithRequest(10),           // Min 10 requests to evaluate
            sre.WithSuccess(0.6),          // 60% success rate required
            sre.WithBucket(10),            // 10 buckets
            sre.WithWindow(10 * time.Second), // 10 second window
        )
    })
    
    // Create HTTP client with circuit breaker
    conn, err := http.NewClient(
        context.Background(),
        http.WithEndpoint("127.0.0.1:8000"),
        http.WithMiddleware(
            recovery.Recovery(),
            circuitbreaker.Client(cb),
        ),
    )
    if err != nil {
        log.Fatal(err)
    }
    defer conn.Close()
    
    // Make requests
    for i := 0; i < 100; i++ {
        resp, err := conn.Get(context.Background(), "/api/data")
        if err != nil {
            log.Printf("Request %d failed: %v", i, err)
            time.Sleep(time.Second)
            continue
        }
        log.Printf("Request %d succeeded: %v", i, resp)
    }
}

Testing Circuit Breaker

package main

import (
    "context"
    "testing"
    "time"
    
    "github.com/go-kratos/aegis/circuitbreaker/sre"
    "github.com/go-kratos/kratos/v2/errors"
    "github.com/go-kratos/kratos/v2/middleware"
    "github.com/go-kratos/kratos/v2/middleware/circuitbreaker"
)

func TestCircuitBreaker(t *testing.T) {
    // Create breaker with low threshold
    cb := circuitbreaker.Client(
        circuitbreaker.WithCircuitBreaker(func() circuitbreaker.CircuitBreaker {
            return sre.NewBreaker(
                sre.WithRequest(5),
                sre.WithSuccess(0.5),
            )
        }),
    )
    
    // Handler that always fails
    failHandler := func(ctx context.Context, req any) (any, error) {
        return nil, errors.InternalServer("TEST", "test error")
    }
    
    handler := cb(failHandler)
    
    // Trigger failures
    for i := 0; i < 10; i++ {
        handler(context.Background(), "test")
    }
    
    // Circuit should be open now
    _, err := handler(context.Background(), "test")
    if err != circuitbreaker.ErrNotAllowed {
        t.Errorf("expected circuit breaker error, got %v", err)
    }
}

Monitoring Circuit Breakers

Track circuit breaker state and metrics:
import (
    "context"
    
    "github.com/go-kratos/aegis/circuitbreaker"
    "github.com/go-kratos/kratos/v2/middleware"
    "go.opentelemetry.io/otel/metric"
)

func CircuitBreakerWithMetrics(
    breaker circuitbreaker.CircuitBreaker,
    meter metric.Meter,
) middleware.Middleware {
    // Create metrics
    allowedCounter, _ := meter.Int64Counter("cb_allowed_total")
    rejectedCounter, _ := meter.Int64Counter("cb_rejected_total")
    failedCounter, _ := meter.Int64Counter("cb_failed_total")
    successCounter, _ := meter.Int64Counter("cb_success_total")
    
    return func(handler middleware.Handler) middleware.Handler {
        return func(ctx context.Context, req any) (any, error) {
            if err := breaker.Allow(); err != nil {
                rejectedCounter.Add(ctx, 1)
                return nil, circuitbreaker.ErrNotAllowed
            }
            
            allowedCounter.Add(ctx, 1)
            reply, err := handler(ctx, req)
            
            if err != nil {
                failedCounter.Add(ctx, 1)
                breaker.MarkFailed()
            } else {
                successCounter.Add(ctx, 1)
                breaker.MarkSuccess()
            }
            
            return reply, err
        }
    }
}

Best Practices

Circuit breakers should only be used on the client side to protect your service from failing dependencies.
// Good: Client-side
http.NewClient(
    ctx,
    http.WithMiddleware(circuitbreaker.Client()),
)

// Bad: Server-side (not supported)
http.NewServer(
    http.Middleware(circuitbreaker.Client()), // Wrong!
)
Set thresholds based on your service’s SLA and traffic patterns:
  • High traffic: Lower request threshold (10-50)
  • Low traffic: Higher request threshold (100+)
  • Critical services: Higher success ratio (0.8-0.9)
  • Non-critical: Lower success ratio (0.5-0.6)
Use circuit breaker with retry and timeout for comprehensive resilience:
http.WithMiddleware(
    recovery.Recovery(),
    timeout.Timeout(5 * time.Second),
    circuitbreaker.Client(),
    retry.Client(),
)
Handle circuit breaker errors gracefully with fallbacks:
data, err := client.GetData(ctx)
if errors.IsServiceUnavailable(err) {
    // Use cached data or default value
    data = cache.Get("data")
}
Track when circuits open and close to identify problematic services.
Set up alerts when circuits open to notify on-call engineers of service degradation.

Fallback Patterns

Cache Fallback

func GetUserWithFallback(ctx context.Context, userID string) (*User, error) {
    // Try remote service
    user, err := client.GetUser(ctx, userID)
    if err != nil {
        if errors.IsServiceUnavailable(err) {
            // Fallback to cache
            if cached := cache.Get(userID); cached != nil {
                return cached.(*User), nil
            }
        }
        return nil, err
    }
    
    // Update cache
    cache.Set(userID, user, 5*time.Minute)
    return user, nil
}

Default Value Fallback

func GetConfigWithFallback(ctx context.Context) (*Config, error) {
    config, err := client.GetConfig(ctx)
    if err != nil {
        if errors.IsServiceUnavailable(err) {
            // Return default config
            return &Config{
                Timeout: 30 * time.Second,
                Retries: 3,
            }, nil
        }
        return nil, err
    }
    return config, nil
}

Troubleshooting

Circuit Opens Too Frequently

  1. Increase success ratio threshold
  2. Increase minimum request count
  3. Check if downstream service is actually unhealthy
  4. Review error handling in client

Circuit Never Opens

  1. Decrease success ratio threshold
  2. Decrease minimum request count
  3. Verify errors are being properly classified
  4. Check if sufficient traffic exists

Circuit Stuck Open

  1. Check if downstream service has recovered
  2. Verify half-open state is working
  3. Review circuit breaker configuration
  4. Check for proper error handling

Source Reference

The circuit breaker middleware implementation can be found in:
  • middleware/circuitbreaker/circuitbreaker.go:44 - Client middleware
  • middleware/circuitbreaker/circuitbreaker.go:15 - ErrNotAllowed error
  • middleware/circuitbreaker/circuitbreaker.go:29 - WithCircuitBreaker option
  • middleware/circuitbreaker/circuitbreaker.go:21 - WithGroup option

Next Steps

Rate Limiting

Add rate limiting protection

Recovery

Handle panics gracefully

Build docs developers (and LLMs) love