tool

Apache Kafka

created 2026-05-25 messaging · kafka · log · streaming · event-sourcing · distributed-systems

Apache Kafka

Kafka is a log. Not a queue, not a broker — a distributed, append-only commit log that consumers read at their own pace and can rewind.

The model

  • Producers append messages to topics, partitioned for parallelism.
  • Messages are written to an append-only log and stay there for the retention window (7 days, 30 days, forever — configurable).
  • Consumers track their own offset (position in the log). Any consumer at any time can rewind.
  • The log is the history. New consumers can join later and replay everything.

When to reach for Kafka

  • Event sourcing — the log is the source of truth.
  • Stream processing — Kafka Streams, ksqlDB, Flink integrations.
  • Inter-team data pipelines — Team A produces; Team B, C, D each consume independently with their own offsets.
  • Replay-as-a-feature — ship a new fraud-detection service Tuesday, reset its offset to 30 days ago Wednesday, let it catch up on a month of history by lunchtime. No re-emission needed.
  • High throughput — millions of messages per second at the high end.

The catch

Kafka has real operational weight:

  • A full Kafka cluster (broker, ZooKeeper or KRaft, schema registry, monitoring) is a system to operate.
  • Running it for 3,500 messages a day isn’t an architecture decision — it’s a resume decision.
  • You pay for it every time someone has to learn it, tune it, or debug it at 4 AM.

Hosted alternatives — Confluent Cloud, AWS MSK, Redpanda Cloud — remove most of the ops pain but the conceptual surface area (partitions, consumer groups, offsets, retention) stays.

Recent update (2025)

Kafka 4.0 added share groups, which give you queue-style consumption natively. So the old “Kafka can’t do queues” line is out of date — but the log is still the reason to reach for it. If you don’t need replay, you probably don’t need Kafka.

Delivery semantics

Kafka advertises “exactly-once semantics” but with a big asterisk: that guarantee only covers what happens inside the Kafka cluster (producer-to-broker + transactional writes across topics). The moment your consumer writes to an external database or calls an external API, you’re back to idempotency|idempotent processing at the consumer side. See delivery-guarantees.

When NOT to pick Kafka

  • You don’t need replay → use sqs or rabbitmq.
  • You need rich per-message routing → use rabbitmq.
  • You don’t want to run a cluster → use sqs or a managed RabbitMQ.

See also

  • rabbitmq — broker with rich routing, no log
  • sqs — managed queue, no replay
  • delivery-guarantees — the “exactly-once” asterisk
  • idempotency — required at the consumer-to-external-system boundary
  • back-pressure — Kafka’s consumer-pull model is inherently back-pressured but partition lag still needs alerting
  • bullmq, pgmq — what kula actually uses; Kafka is heavy for personal-scale projects