Apache Kafka

Kafka is a log. Not a queue, not a broker — a distributed, append-only commit log that consumers read at their own pace and can rewind.

The model

Producers append messages to topics, partitioned for parallelism.
Messages are written to an append-only log and stay there for the retention window (7 days, 30 days, forever — configurable).
Consumers track their own offset (position in the log). Any consumer at any time can rewind.
The log is the history. New consumers can join later and replay everything.

When to reach for Kafka

Event sourcing — the log is the source of truth.
Stream processing — Kafka Streams, ksqlDB, Flink integrations.
Inter-team data pipelines — Team A produces; Team B, C, D each consume independently with their own offsets.
Replay-as-a-feature — ship a new fraud-detection service Tuesday, reset its offset to 30 days ago Wednesday, let it catch up on a month of history by lunchtime. No re-emission needed.
High throughput — millions of messages per second at the high end.

The catch

Kafka has real operational weight:

A full Kafka cluster (broker, ZooKeeper or KRaft, schema registry, monitoring) is a system to operate.
Running it for 3,500 messages a day isn’t an architecture decision — it’s a resume decision.
You pay for it every time someone has to learn it, tune it, or debug it at 4 AM.

Hosted alternatives — Confluent Cloud, AWS MSK, Redpanda Cloud — remove most of the ops pain but the conceptual surface area (partitions, consumer groups, offsets, retention) stays.

Recent update (2025)

Kafka 4.0 added share groups, which give you queue-style consumption natively. So the old “Kafka can’t do queues” line is out of date — but the log is still the reason to reach for it. If you don’t need replay, you probably don’t need Kafka.

Delivery semantics

Kafka advertises “exactly-once semantics” but with a big asterisk: that guarantee only covers what happens inside the Kafka cluster (producer-to-broker + transactional writes across topics). The moment your consumer writes to an external database or calls an external API, you’re back to idempotency|idempotent processing at the consumer side. See delivery-guarantees.

When NOT to pick Kafka

You don’t need replay → use sqs or rabbitmq.
You need rich per-message routing → use rabbitmq.
You don’t want to run a cluster → use sqs or a managed RabbitMQ.