concept

Cascading Failure

created 2026-05-25 distributed-systems · reliability · cascading-failure · coupling · patterns

Cascading Failure

When one slow or failing dependency takes down everything that depends on it, even though the dependency itself didn’t crash. The mechanism is synchronous coupling + thread exhaustion + retry amplification.

The textbook example

Checkout calls Inventory directly (synchronous HTTP). Inventory slows down — doesn’t crash, just slows. Checkout threads pile up waiting. 300, 400 threads in. Checkout is out of threads. Anything depending on Checkout is now also down.

One slow dependency, three broken services.

The mechanism

  • Synchronous coupling is the conduit — without it, slowness can’t propagate as outage.
  • Thread / connection-pool exhaustion is the resource that runs out.
  • Retries pile up, each retry consuming another thread/connection, amplifying the original slowdown.
  • Health checks fail because pools are saturated, so upstream load balancers shift more traffic to “healthy” instances, which then saturate too. (See: “retry storm.”)

The fix

Decouple the producer from the consumer in three dimensions:

DimensionWhat it means
TimeThey don’t need to run at the same moment.
AvailabilityOne can be down without taking the other down.
SpeedThe fast one isn’t held hostage by the slow one.

A queue between them does all three. Inventory slows → messages accumulate in the queue → Checkout doesn’t care. It writes and moves on. That’s the whole point of a queue.

Other patterns that decouple:

  • Circuit breakers — fail fast when the dependency is unhealthy, don’t pile up retries.
  • Bulkheads — separate thread pools per dependency so saturation in one doesn’t drown the others.
  • Timeouts at every hop — never wait forever.
  • Async / event-driven calls — don’t block at all.

When the queue itself becomes the cascade

A queue prevents synchronous cascading failure, but introduces its own failure mode: unbounded growth. See back-pressure. Without bounds, the queue is the new thing that fills memory and takes down the broker (which then takes down everyone using it).

See also