All Posts

Fail Fast, Fail Safe: Understanding the Circuit Breaker Pattern for Microservices

January 20, 2025

Circuit Breaker

Microservices

Software Architecture

Service failures are inevitable at times in a microservice architecture. Although no universal solution exists for every scenario, having the right tools available can be extremely useful for managing various failure scenarios. One widely used method for dealing with service failures is the Circuit Breaker Pattern, which helps prevent cascading failures and ensures the system remains resilient.

Cascading Failures

Consider a scenario with a group of microservices, where the message service relies on external service, which Mailgun to send emails for each communication on other services. The message service requred to performe an outbound call to Mailgun API every time to send a message.

MessageServiceService AService BService CMailgun500500500

But what happens when the Mailgun API goes down? One simple approach might be to implement a retry mechanism, where the messaging service retries the request to Mailgun a set number of times before returning an error to other services. However, this approach could lead to several issue:

  • The messaging service consumes valuable resources retrying, which could delay other requests that don't depend on Mailgun.
  • Clients are kept waiting even when the request is likely to fail.
  • If Mailgun is overwhelmed, retries could make the problem worse.

So, how can we fail fast and prevent cascading failures when we know a request is meant to fail? This is where the Circuit Breaker pattern comes into play.

Circuit Breaker Pattern

A circuit breaker surrounds an outbound call and tracks its failures. When the failure rate surpasses a defined threshold, the circuit breaker trips. Once this happens, subsequent requests to the circuit breaker will immediately return an error, without triggering the outbound function.

Service AService BService CMailgunOutboundCallMessage ServiceCircuit Breakerif cascading error,return immediatelyif response ok,do outbound call

During the trip state, several fallback strategies can be applied:

  • Immediately return an error to the service or client.
  • Serve a cached response from a prior successful request.
  • Redirect the request to a secondary service, such as Qiscus, which serves as an alternative messaging platform.

This straightforward approach helps prevent other services or even clients from waiting unnecessarily and avoids overloading server resources for operations that are likely to fail.

How It Works

The circuit breaker operates in three distinct states: closed, open, and half-open.

ClosedOpenHalf Opensuccesstoo manyfailurestimeoutsuccessfailurefail immediate
  • Closed, the circuit breaker functions normally, allowing the outbound function to be called without any restrictions.
  • Open, it immediately fails any request and prevents the outbound function from being called.
  • Half-Open, the circuit breaker allows a limited number of calls to the outbound function.
    • If these calls succeed, it transitions back to the closed state.
    • If they fail, it remains in the open state.

The circuit breaker continuously monitors the ratio of failed to successful calls within a specific time window. If this failure ratio exceeds a predefined threshold, it trips and enters the open state.

At determined intervals, the circuit breaker switches to the half-open state and permits a limited number of function calls. If these calls fail, the circuit breaker stays open. If they succeed, the circuit breaker returns to the closed state.

Practical Overview

There's a lot of open-source package available for implementing the Circuit Breaker pattern, each with its own specifics. However, the general approach remains consistent. The circuit breaker wraps standard HTTP calls in a function that manages the state of the circuit and tracks the success or failure of requests.

In Go, we can implement this concept using a CircuitBreaker struct bellow that tracks the circuit's state, total requests, and counts of successful and failed requests. A mutex will be used to safely update these values in a concurrent environment, ensuring there are no race conditions.

pkg/circuit_breaker.go
type CircuitBreaker struct {
    State         string
    RequestCount  uint32
    SuccessCount  uint32
    FailureCount  uint32
    MutexLock     Sync.mutex
}

The main functionality of the Circuit Breaker is to check if a request can be executed based on its current state. When the Execute method is called, it first checks if the request should be allowed. If not, it returns an error immediately. If allowed, it executes the outbound request and then updates the success/failure counts.

pkg/circuit_breaker.go
func(cb * CircuitBreaker) Execute(
    req func()(interface {}, error),
)(interface {}, error) {
    err: = cb.preRequest()
    if err != nil {
        return nil, err
    }
 
    res, err: = req()
 
   err = cb.postRequest(err)
   if err != nil {
       return nil, err
   }
 
   return res, nil
}

From previous scenario, we can wrap the SendMail as the function to send mail to MailGun as external service. This function is passed to Execute method, which manages the request, tracks success/failures, and handles state transitions.

internal/outbound/mailgun.go
func(i impl) SendMail(mail mail.Spec)(
    resp interface {}, err error
) {
    fn: = func()(interface {}, error) {
        return http.Post("<MAILGUN_URL>", mail)
    }
 
    return i.CircuitBreaker.Execute(fn)
}

After each request, the circuit breaker updates the success/failure counts and recalculates the success ratio. This helps determine the circuit's state, controls whether to allow further requests, preventing cascading failures, and giving the system time to recover.

Wrapping It Up

The circuit breaker pattern is essential in microservices architectures, where distributed services can fail independently. It helps manage failures by monitoring requests, tracking successes and failures, and controlling flow based on system health. By encapsulating risky operations, it prevents cascading failures, ensuring more reliable and resilient communication between services.

Built with Next.js, MDX, Tailwind and Vercel