concept

Streaming as Runtime

created 2026-05-28 ai · agents · streaming · reactive · observability · architecture · mozaik

Streaming as Runtime

The idea that LLM streaming is not just a UX nicety (tokens appearing faster) but a runtime control channel: partial model output becomes typed semantic events that other participants observe and act on while inference is still happening. The framing line from mozaik:

“Token streaming improves the interface. Reactive streaming changes the runtime.”

Two Uses of Streaming

Token streaming (UI)Reactive streaming (runtime)
AudienceThe end user’s screenOther agents/tools/observers
PurposePerceived latencyControl, coordination, safety
UnitRaw tokens/chunksTyped semantic events
Covered bystreaming-chat-architecturethis page

These are orthogonal — a system can do both. kulify projects already do the first (SSE / ReadableStream → UI in master-bot, vorma, fajb-next); this concept is about the second.

Semantic Events

Streamed output is wrapped as typed event objects with a consistent shape across internal (own model) and external (peer) sources, so any participant can handle a partial response uniformly. Agents distinguish origin via onInternalEvent() / onExternalEvent() (and the typed onReasoning / onFunctionCall / onExternalReasoning … family — see reactive-agents).

Five Capabilities It Unlocks

  1. Inference interception — a participant (e.g. a safety reviewer) watches the stream and intervenes mid-generation when it sees an unsafe or wrong direction.
  2. Abort and redirect — cancel generation the moment enough information exists, instead of paying for the full completion. Direct token-cost win.
  3. Live agent handoff — delegate to a specialist mid-stream as context emerges, rather than after a full turn.
  4. Observability — real-time visibility into reasoning, function calls, and tool results during execution, not just in post-hoc logs.
  5. Multi-agent collaboration — concurrent participants reacting to one another instead of a rigid sequential pipeline (see agent-swarms).

A consequence: policy/safety enforcement becomes an emergent property of who is subscribed to the stream, not an external wrapper bolted around the agent.

Design Tension: Back-Pressure

If listeners can act on a live stream, what happens when one is slow? mozaik delivers events synchronously to all subscribers so a slow listener never blocks producers — a deliberate producer-priority choice. That’s one answer to back-pressure; the trade is a slow observer may miss the window to intercept. Any “streaming as runtime” system has to pick a stance here.

Relevance to Kulify

The abort-and-redirect and live-observer capabilities map cleanly onto the investorchat supervisor design and the broader ai-agent-architectures multi-agent direction: a cheap watcher that aborts a wrong-headed generation early is straight token savings, and mid-stream handoff fits multi-domain queries.