Overview
The circuit breaker middleware implements the circuit breaker pattern to prevent cascading failures when calling external services. It monitors error rates and automatically stops sending requests to failing services, giving them time to recover.
Installation
go get github.com/go-kratos/kratos/v2/middleware/circuitbreaker
go get github.com/go-kratos/aegis/circuitbreaker
Client Middleware
The Client function creates a client-side circuit breaker middleware:
func Client ( opts ... Option ) middleware . Middleware
Circuit breaker is designed for client-side use only. It protects your service from failing dependencies.
Basic Usage
import (
" github.com/go-kratos/kratos/v2/middleware/circuitbreaker "
" github.com/go-kratos/kratos/v2/transport/http "
" github.com/go-kratos/kratos/v2/transport/grpc "
)
// HTTP client with circuit breaker
conn , err := http . NewClient (
context . Background (),
http . WithEndpoint ( "127.0.0.1:8000" ),
http . WithMiddleware (
circuitbreaker . Client (),
),
)
// gRPC client with circuit breaker
conn , err := grpc . DialInsecure (
context . Background (),
grpc . WithEndpoint ( "127.0.0.1:9000" ),
grpc . WithMiddleware (
circuitbreaker . Client (),
),
)
How It Works
The circuit breaker has three states:
Success rate improves
┌──────────────────────────┐
│ │
│ v
┌──────────┐ Errors ┌──────────┐ Test ┌──────────┐
│ CLOSED │──────────>│ OPEN │─────────>│ HALF-OPEN│
│ (Normal) │ exceed │ (Blocked) │ request │ (Testing) │
└──────────┘ threshold └──────────┘ └──────────┘
^ │
│ │
└───────────── Errors continue ───────────┘
States
CLOSED - Normal operation, requests pass through
OPEN - Circuit is open, requests are rejected immediately
HALF-OPEN - Testing if service has recovered, allows limited requests
SRE Algorithm
The default implementation uses Google SRE’s circuit breaker algorithm:
Formula: requests / (requests + accepts) < K
Where:
requests = Total requests in window
accepts = Successful requests (non-error)
K = Threshold (typically 2.0)
When the ratio exceeds K, the circuit opens.
Error Response
When the circuit is open, requests are rejected with:
var ErrNotAllowed = errors . New ( 503 , "CIRCUITBREAKER" , "request failed due to circuit breaker triggered" )
This returns:
HTTP status: 503 Service Unavailable
gRPC code: UNAVAILABLE
Reason: "CIRCUITBREAKER"
Configuration Options
WithCircuitBreaker
Use a custom circuit breaker generator function:
import (
" github.com/go-kratos/aegis/circuitbreaker/sre "
" github.com/go-kratos/kratos/v2/middleware/circuitbreaker "
)
circuitbreaker . Client (
circuitbreaker . WithCircuitBreaker ( func () circuitbreaker . CircuitBreaker {
return sre . NewBreaker (
sre . WithRequest ( 100 ), // Minimum requests before calculating
sre . WithSuccess ( 0.5 ), // Success ratio (0.5 = 50%)
sre . WithBucket ( 10 ), // Number of buckets in window
sre . WithWindow ( time . Second * 30 ), // Time window
)
}),
)
WithGroup
Use a shared circuit breaker group:
import (
" github.com/go-kratos/aegis/circuitbreaker "
" github.com/go-kratos/kratos/v2/internal/group "
" github.com/go-kratos/kratos/v2/middleware/circuitbreaker "
)
// Create shared group
group := group . NewGroup ( func () circuitbreaker . CircuitBreaker {
return sre . NewBreaker ()
})
// Use in multiple clients
circuitbreaker . Client (
circuitbreaker . WithGroup ( group ),
)
Per-Operation Circuit Breakers
By default, circuit breakers are created per operation . Each endpoint gets its own circuit breaker:
// Each operation has its own circuit breaker
GET / api / users -> circuit breaker 1
GET / api / posts -> circuit breaker 2
POST / api / users -> circuit breaker 3
This is implemented in the middleware:
info , _ := transport . FromClientContext ( ctx )
breaker := opt . group . Get ( info . Operation ())
Triggering Conditions
The circuit breaker triggers on:
500 Internal Server Error
503 Service Unavailable
504 Gateway Timeout
if err != nil && (
errors . IsInternalServer ( err ) ||
errors . IsServiceUnavailable ( err ) ||
errors . IsGatewayTimeout ( err )
) {
breaker . MarkFailed ()
} else {
breaker . MarkSuccess ()
}
Other errors (400, 401, 404, etc.) are not considered circuit breaker failures.
Complete Example
package main
import (
" context "
" log "
" time "
" github.com/go-kratos/aegis/circuitbreaker/sre "
" github.com/go-kratos/kratos/v2/middleware/circuitbreaker "
" github.com/go-kratos/kratos/v2/middleware/recovery "
" github.com/go-kratos/kratos/v2/transport/http "
)
func main () {
// Create custom circuit breaker
cb := circuitbreaker . WithCircuitBreaker ( func () circuitbreaker . CircuitBreaker {
return sre . NewBreaker (
sre . WithRequest ( 10 ), // Min 10 requests to evaluate
sre . WithSuccess ( 0.6 ), // 60% success rate required
sre . WithBucket ( 10 ), // 10 buckets
sre . WithWindow ( 10 * time . Second ), // 10 second window
)
})
// Create HTTP client with circuit breaker
conn , err := http . NewClient (
context . Background (),
http . WithEndpoint ( "127.0.0.1:8000" ),
http . WithMiddleware (
recovery . Recovery (),
circuitbreaker . Client ( cb ),
),
)
if err != nil {
log . Fatal ( err )
}
defer conn . Close ()
// Make requests
for i := 0 ; i < 100 ; i ++ {
resp , err := conn . Get ( context . Background (), "/api/data" )
if err != nil {
log . Printf ( "Request %d failed: %v " , i , err )
time . Sleep ( time . Second )
continue
}
log . Printf ( "Request %d succeeded: %v " , i , resp )
}
}
Testing Circuit Breaker
package main
import (
" context "
" testing "
" time "
" github.com/go-kratos/aegis/circuitbreaker/sre "
" github.com/go-kratos/kratos/v2/errors "
" github.com/go-kratos/kratos/v2/middleware "
" github.com/go-kratos/kratos/v2/middleware/circuitbreaker "
)
func TestCircuitBreaker ( t * testing . T ) {
// Create breaker with low threshold
cb := circuitbreaker . Client (
circuitbreaker . WithCircuitBreaker ( func () circuitbreaker . CircuitBreaker {
return sre . NewBreaker (
sre . WithRequest ( 5 ),
sre . WithSuccess ( 0.5 ),
)
}),
)
// Handler that always fails
failHandler := func ( ctx context . Context , req any ) ( any , error ) {
return nil , errors . InternalServer ( "TEST" , "test error" )
}
handler := cb ( failHandler )
// Trigger failures
for i := 0 ; i < 10 ; i ++ {
handler ( context . Background (), "test" )
}
// Circuit should be open now
_ , err := handler ( context . Background (), "test" )
if err != circuitbreaker . ErrNotAllowed {
t . Errorf ( "expected circuit breaker error, got %v " , err )
}
}
Monitoring Circuit Breakers
Track circuit breaker state and metrics:
import (
" context "
" github.com/go-kratos/aegis/circuitbreaker "
" github.com/go-kratos/kratos/v2/middleware "
" go.opentelemetry.io/otel/metric "
)
func CircuitBreakerWithMetrics (
breaker circuitbreaker . CircuitBreaker ,
meter metric . Meter ,
) middleware . Middleware {
// Create metrics
allowedCounter , _ := meter . Int64Counter ( "cb_allowed_total" )
rejectedCounter , _ := meter . Int64Counter ( "cb_rejected_total" )
failedCounter , _ := meter . Int64Counter ( "cb_failed_total" )
successCounter , _ := meter . Int64Counter ( "cb_success_total" )
return func ( handler middleware . Handler ) middleware . Handler {
return func ( ctx context . Context , req any ) ( any , error ) {
if err := breaker . Allow (); err != nil {
rejectedCounter . Add ( ctx , 1 )
return nil , circuitbreaker . ErrNotAllowed
}
allowedCounter . Add ( ctx , 1 )
reply , err := handler ( ctx , req )
if err != nil {
failedCounter . Add ( ctx , 1 )
breaker . MarkFailed ()
} else {
successCounter . Add ( ctx , 1 )
breaker . MarkSuccess ()
}
return reply , err
}
}
}
Best Practices
Circuit breakers should only be used on the client side to protect your service from failing dependencies. // Good: Client-side
http . NewClient (
ctx ,
http . WithMiddleware ( circuitbreaker . Client ()),
)
// Bad: Server-side (not supported)
http . NewServer (
http . Middleware ( circuitbreaker . Client ()), // Wrong!
)
Tune Thresholds Carefully
Set thresholds based on your service’s SLA and traffic patterns:
High traffic: Lower request threshold (10-50)
Low traffic: Higher request threshold (100+)
Critical services: Higher success ratio (0.8-0.9)
Non-critical: Lower success ratio (0.5-0.6)
Combine with Retry and Timeout
Use circuit breaker with retry and timeout for comprehensive resilience: http . WithMiddleware (
recovery . Recovery (),
timeout . Timeout ( 5 * time . Second ),
circuitbreaker . Client (),
retry . Client (),
)
Handle circuit breaker errors gracefully with fallbacks: data , err := client . GetData ( ctx )
if errors . IsServiceUnavailable ( err ) {
// Use cached data or default value
data = cache . Get ( "data" )
}
Monitor Circuit Breaker State
Track when circuits open and close to identify problematic services.
Set up alerts when circuits open to notify on-call engineers of service degradation.
Fallback Patterns
Cache Fallback
func GetUserWithFallback ( ctx context . Context , userID string ) ( * User , error ) {
// Try remote service
user , err := client . GetUser ( ctx , userID )
if err != nil {
if errors . IsServiceUnavailable ( err ) {
// Fallback to cache
if cached := cache . Get ( userID ); cached != nil {
return cached .( * User ), nil
}
}
return nil , err
}
// Update cache
cache . Set ( userID , user , 5 * time . Minute )
return user , nil
}
Default Value Fallback
func GetConfigWithFallback ( ctx context . Context ) ( * Config , error ) {
config , err := client . GetConfig ( ctx )
if err != nil {
if errors . IsServiceUnavailable ( err ) {
// Return default config
return & Config {
Timeout : 30 * time . Second ,
Retries : 3 ,
}, nil
}
return nil , err
}
return config , nil
}
Troubleshooting
Circuit Opens Too Frequently
Increase success ratio threshold
Increase minimum request count
Check if downstream service is actually unhealthy
Review error handling in client
Circuit Never Opens
Decrease success ratio threshold
Decrease minimum request count
Verify errors are being properly classified
Check if sufficient traffic exists
Circuit Stuck Open
Check if downstream service has recovered
Verify half-open state is working
Review circuit breaker configuration
Check for proper error handling
Source Reference
The circuit breaker middleware implementation can be found in:
middleware/circuitbreaker/circuitbreaker.go:44 - Client middleware
middleware/circuitbreaker/circuitbreaker.go:15 - ErrNotAllowed error
middleware/circuitbreaker/circuitbreaker.go:29 - WithCircuitBreaker option
middleware/circuitbreaker/circuitbreaker.go:21 - WithGroup option
Next Steps
Rate Limiting Add rate limiting protection
Recovery Handle panics gracefully