concept

Headless Browser Pool

created 2026-06-07 chromedp · headless · chrome · pool · queue · back-pressure · memory · go

Headless Browser Pool

A bounded pool of pre-warmed headless Chrome contexts sitting behind your HTTP handlers, with a bounded wait queue that converts overflow into 429 Retry-After instead of OOM.

This is the only sane way to expose Chrome-driven rendering (screenshots, PDFs, JS-evaluated scraping) as a public HTTP API. The naive alternatives — spawning a Chrome per request, or a single shared browser — both fail.

Why naive doesn’t work

Naive approachFailure mode
Spawn Chrome per request1–3s cold start, 300–500MB RAM per process, easy OOM under load
Single Chrome, single contextTab state cross-contaminates between requests (cookies, storage, in-flight navigations)
Single Chrome, many contexts, unboundedOne slow render blocks N others; concurrent count is unlimited → memory blows up

Headless Chrome’s per-context memory is roughly 300–500MB during an active render (image decoding, JS heap, layer compositing). A 16GB box can sustain maybe 20 truly-concurrent renders before swap. A 4GB box: 4–6. So the pool size is a deliberate cap derived from the host’s RAM, not from “concurrency we want.”

The pattern

Request ──→ Pool.Acquire(ctx) ──→ ┌── slot 1 (warm) ──┐
                                  │── slot 2 (warm) ──│  ← N pre-warmed
                                  │── slot 3 (warm) ──│    contexts
                                  └── slot N (warm) ──┘

                                  No slot? → wait queue (bounded)

                                  Queue full? → ErrPoolBusy → 429 + Retry-After

Components

  1. One Chrome process (multi-process is over-engineering at this scale).
  2. N tab contexts (chromedp.NewContext(rootCtx)), created at boot, pre-warmed by navigating to about:blank once so the V8 heap is initialized.
  3. A buffered channel of size N holding “available slot” tokens.
  4. A wait queue — also bounded — for requests that arrive when all slots are busy.
  5. Per-render timeout (30s default) enforced via context.WithTimeout on the acquired context.

Acquire / release

Acquire(ctx):
    select {
    case slot := <-pool.idle:      return slot
    case <-ctx.Done():             return ctx.Err()
    case <-pool.queueFull:         return ErrPoolBusy
    default:
        select {
        case pool.queue <- struct{}{}:  // queue spot reserved
            defer func() { <-pool.queue }()
            select {
            case slot := <-pool.idle:   return slot
            case <-ctx.Done():          return ctx.Err()
            }
        default:
            return ErrPoolBusy
        }
    }

Release(slot):
    if slot.healthy() { pool.idle <- slot } else { pool.replace(slot) }

The shape — semaphore + bounded queue — is the standard back-pressure idiom applied to a non-trivial resource. See back-pressure for the general pattern; delivery-guarantees for why you choose 429 over silently buffering.

Sizing

  • Default 3–5 slots on a 4–8GB host. Measure RSS under 20-concurrent load; pick the largest N where 95th-percentile RSS stays under (host RAM − safety margin).
  • Queue depthslots × expected p95 render seconds. Past that, the user is better off retrying than waiting — return 429 with a Retry-After based on observed queue drain rate.
  • Per-render timeout 30s. Anything longer is a stuck page or an attack; kill it and free the slot.

Health checks

A slot can go bad: page crashed, context leaked listeners, V8 heap fragmented. After release, sanity-check the slot:

  • chromedp.Evaluate a simple 1+1 — must return 2 within 1s
  • If the slot has done N renders (default 50–100), retire it and create a new one
  • On ctx.Err() == context.DeadlineExceeded, mark the slot dirty; the next Acquire that gets it should replace, not reuse

Speed levers (per render)

  • Block ad/tracker domains at the network layer. Many target pages spend half their render time on doubleclick / GA / Hotjar. A 30-line blocklist cuts p50 latency 30–50%.
  • Disable images for the metadata-only path (we already don’t use Chrome for url-intel/overview‘s /v1/metadata, but ?images=false is a useful screenshot flag).
  • Pre-warm with about:blank so the first real navigation isn’t paying V8 init cost.
  • networkidle2 beats networkidle0 for screenshots — many pages have long-poll connections that never go fully idle but visual content is done.

Failure modes the pool prevents

Without poolWith pool
Burst of 50 requests → 50 Chrome processes → OOM kill50 requests → 5 slots busy + 10 queued + 35 immediate 429s
One stuck render holds memory forever30s timeout kills it, slot returns to idle
Slow renders pile up behind one anotherBounded queue means you fail fast, client retries with backoff
Cookie / storage contamination between usersEach render uses an isolated chromedp.NewContext, cleared on release

Graceful shutdown

On SIGTERM, the service should:

  1. Stop accepting new requests (router returns 503).
  2. Drain in-flight renders — wait up to min(maxRenderTimeout, 30s).
  3. Close all chromedp contexts, then the root browser.
  4. Exit.

In url-intel/overview‘s cmd/server/main.go, this is the srv.Shutdown(ctx) path with the 30s deadline.

Alternatives considered

  • Playwright instead of chromedp: Node-native, better ergonomics. We pick chromedp because the service is Go and chromedp/headless-shell is a ~100MB base image vs ~1GB for full Playwright + Chromium.
  • A managed render service (Browserless, ScrapingBee): zero ops but you pay per render and you’re a margin layer on someone else’s. We’re selling URL rendering, so we own the renderer.
  • Per-request docker spawn: clean isolation but cold start is fatal for a sync HTTP API. Maybe for an async job queue (out of scope for v1).

Where this matters in kulify

Currently only url-intel/overview. Future products in the mini-apps/overview|mini-apps series that need rendering (e.g. an SEO/preview checker, a “render markdown to image” tool) reuse this package — the render internal package becomes shared infrastructure.