Back Pressure

The umbrella term for how a slow consumer pushes back on a fast producer. Without back pressure, the producer happily writes faster than the consumer can read, the queue grows unbounded, and either memory, disk, or your on-call patience runs out first.

Two flavors of failure

Loud failure	Quiet failure
Broker runs out of memory at 03:47 AM. Pager screams. Everyone wakes up.	Nothing crashes. The queue just grows. Messages get processed two hours late. For a fraud check on a credit-card transaction, an answer that arrives two hours late is the same as no answer.

Loud failures get fixed. Quiet failures often don’t — until customers notice.

Three techniques

1. Bounded queues (start here)

Cap the queue size. When it’s full, the producer either blocks or fails fast. That’s the one to reach for first because it fails loudly — errors surface, alerts fire, you find out while you still have time to do something.

2. Auto-scale the consumer side

If queue depth crosses a threshold, add more workers. Works well when:

Consumers are stateless (no per-worker affinity).
Workload is spiky (steady-state autoscaling is wasteful).
Downstream dependencies can absorb the extra parallelism.

Bad fit when consumers serialize on a shared resource (e.g. database row locks) — adding workers just adds contention.

3. Credit-based flow control

The consumer tells the producer how many messages it’s ready for. The producer sends that many and then stops. Nothing moves unless the downstream side has explicitly asked for it.

This is the model behind reactive streams: Project Reactor, RxJava, Akka Streams, Java’s Flow API. It’s the most disciplined, most overhead, hardest to retrofit — but if you’re already in a reactive stack, it’s automatic.

The takeaway

Every queue has a limit. Either you pick it and plan what happens when you hit it, or the OS picks it for you by killing the process. The second version is always more expensive than the first.

What to check today

Find your queues. For each one: is there a max size configured? If not, the OS has configured one for you, and you’ll find out what it is the hard way.
For unbounded queues: pick a bound based on memory headroom × message-size estimate, and decide what producers do when full (block vs fail-fast vs spill-to-disk).
Make sure DLQ depth and main-queue depth both alert. See dead-letter-queue.