Streaming as Runtime

The idea that LLM streaming is not just a UX nicety (tokens appearing faster) but a runtime control channel: partial model output becomes typed semantic events that other participants observe and act on while inference is still happening. The framing line from mozaik:

“Token streaming improves the interface. Reactive streaming changes the runtime.”

Two Uses of Streaming

	Token streaming (UI)	Reactive streaming (runtime)
Audience	The end user’s screen	Other agents/tools/observers
Purpose	Perceived latency	Control, coordination, safety
Unit	Raw tokens/chunks	Typed semantic events
Covered by	streaming-chat-architecture	this page

These are orthogonal — a system can do both. kulify projects already do the first (SSE / ReadableStream → UI in master-bot, vorma, fajb-next); this concept is about the second.

Semantic Events

Streamed output is wrapped as typed event objects with a consistent shape across internal (own model) and external (peer) sources, so any participant can handle a partial response uniformly. Agents distinguish origin via onInternalEvent() / onExternalEvent() (and the typed onReasoning / onFunctionCall / onExternalReasoning … family — see reactive-agents).

Five Capabilities It Unlocks

Inference interception — a participant (e.g. a safety reviewer) watches the stream and intervenes mid-generation when it sees an unsafe or wrong direction.
Abort and redirect — cancel generation the moment enough information exists, instead of paying for the full completion. Direct token-cost win.
Live agent handoff — delegate to a specialist mid-stream as context emerges, rather than after a full turn.
Observability — real-time visibility into reasoning, function calls, and tool results during execution, not just in post-hoc logs.
Multi-agent collaboration — concurrent participants reacting to one another instead of a rigid sequential pipeline (see agent-swarms).

A consequence: policy/safety enforcement becomes an emergent property of who is subscribed to the stream, not an external wrapper bolted around the agent.

Design Tension: Back-Pressure

If listeners can act on a live stream, what happens when one is slow? mozaik delivers events synchronously to all subscribers so a slow listener never blocks producers — a deliberate producer-priority choice. That’s one answer to back-pressure; the trade is a slow observer may miss the window to intercept. Any “streaming as runtime” system has to pick a stance here.

Relevance to Kulify

The abort-and-redirect and live-observer capabilities map cleanly onto the investorchat supervisor design and the broader ai-agent-architectures multi-agent direction: a cheap watcher that aborts a wrong-headed generation early is straight token savings, and mid-stream handoff fits multi-domain queries.