concept
Headless Browser Pool
Headless Browser Pool
A bounded pool of pre-warmed headless Chrome contexts sitting behind your
HTTP handlers, with a bounded wait queue that converts overflow into
429 Retry-After instead of OOM.
This is the only sane way to expose Chrome-driven rendering (screenshots, PDFs, JS-evaluated scraping) as a public HTTP API. The naive alternatives — spawning a Chrome per request, or a single shared browser — both fail.
Why naive doesn’t work
| Naive approach | Failure mode |
|---|---|
| Spawn Chrome per request | 1–3s cold start, 300–500MB RAM per process, easy OOM under load |
| Single Chrome, single context | Tab state cross-contaminates between requests (cookies, storage, in-flight navigations) |
| Single Chrome, many contexts, unbounded | One slow render blocks N others; concurrent count is unlimited → memory blows up |
Headless Chrome’s per-context memory is roughly 300–500MB during an active render (image decoding, JS heap, layer compositing). A 16GB box can sustain maybe 20 truly-concurrent renders before swap. A 4GB box: 4–6. So the pool size is a deliberate cap derived from the host’s RAM, not from “concurrency we want.”
The pattern
Request ──→ Pool.Acquire(ctx) ──→ ┌── slot 1 (warm) ──┐
│── slot 2 (warm) ──│ ← N pre-warmed
│── slot 3 (warm) ──│ contexts
└── slot N (warm) ──┘
│
No slot? → wait queue (bounded)
│
Queue full? → ErrPoolBusy → 429 + Retry-After
Components
- One Chrome process (multi-process is over-engineering at this scale).
- N tab contexts (
chromedp.NewContext(rootCtx)), created at boot, pre-warmed by navigating toabout:blankonce so the V8 heap is initialized. - A buffered channel of size N holding “available slot” tokens.
- A wait queue — also bounded — for requests that arrive when all slots are busy.
- Per-render timeout (30s default) enforced via
context.WithTimeouton the acquired context.
Acquire / release
Acquire(ctx):
select {
case slot := <-pool.idle: return slot
case <-ctx.Done(): return ctx.Err()
case <-pool.queueFull: return ErrPoolBusy
default:
select {
case pool.queue <- struct{}{}: // queue spot reserved
defer func() { <-pool.queue }()
select {
case slot := <-pool.idle: return slot
case <-ctx.Done(): return ctx.Err()
}
default:
return ErrPoolBusy
}
}
Release(slot):
if slot.healthy() { pool.idle <- slot } else { pool.replace(slot) }
The shape — semaphore + bounded queue — is the standard back-pressure idiom applied to a non-trivial resource. See back-pressure for the general pattern; delivery-guarantees for why you choose 429 over silently buffering.
Sizing
- Default 3–5 slots on a 4–8GB host. Measure RSS under 20-concurrent load; pick the largest N where 95th-percentile RSS stays under (host RAM − safety margin).
- Queue depth ≈
slots × expected p95 render seconds. Past that, the user is better off retrying than waiting — return 429 with aRetry-Afterbased on observed queue drain rate. - Per-render timeout 30s. Anything longer is a stuck page or an attack; kill it and free the slot.
Health checks
A slot can go bad: page crashed, context leaked listeners, V8 heap fragmented. After release, sanity-check the slot:
chromedp.Evaluatea simple1+1— must return 2 within 1s- If the slot has done N renders (default 50–100), retire it and create a new one
- On
ctx.Err() == context.DeadlineExceeded, mark the slot dirty; the next Acquire that gets it should replace, not reuse
Speed levers (per render)
- Block ad/tracker domains at the network layer. Many target pages spend half their render time on doubleclick / GA / Hotjar. A 30-line blocklist cuts p50 latency 30–50%.
- Disable images for the metadata-only path (we already don’t use Chrome
for url-intel/overview‘s
/v1/metadata, but?images=falseis a useful screenshot flag). - Pre-warm with
about:blankso the first real navigation isn’t paying V8 init cost. networkidle2beatsnetworkidle0for screenshots — many pages have long-poll connections that never go fully idle but visual content is done.
Failure modes the pool prevents
| Without pool | With pool |
|---|---|
| Burst of 50 requests → 50 Chrome processes → OOM kill | 50 requests → 5 slots busy + 10 queued + 35 immediate 429s |
| One stuck render holds memory forever | 30s timeout kills it, slot returns to idle |
| Slow renders pile up behind one another | Bounded queue means you fail fast, client retries with backoff |
| Cookie / storage contamination between users | Each render uses an isolated chromedp.NewContext, cleared on release |
Graceful shutdown
On SIGTERM, the service should:
- Stop accepting new requests (router returns 503).
- Drain in-flight renders — wait up to
min(maxRenderTimeout, 30s). - Close all chromedp contexts, then the root browser.
- Exit.
In url-intel/overview‘s cmd/server/main.go, this is the srv.Shutdown(ctx)
path with the 30s deadline.
Alternatives considered
- Playwright instead of chromedp: Node-native, better ergonomics. We pick
chromedp because the service is Go and
chromedp/headless-shellis a ~100MB base image vs ~1GB for full Playwright + Chromium. - A managed render service (Browserless, ScrapingBee): zero ops but you pay per render and you’re a margin layer on someone else’s. We’re selling URL rendering, so we own the renderer.
- Per-request docker spawn: clean isolation but cold start is fatal for a sync HTTP API. Maybe for an async job queue (out of scope for v1).
Where this matters in kulify
Currently only url-intel/overview. Future products in the
mini-apps/overview|mini-apps series that need rendering (e.g. an SEO/preview
checker, a “render markdown to image” tool) reuse this package — the render
internal package becomes shared infrastructure.
Related
- chromedp — the Go library this pool is built on
- ssrf-guard — Chrome’s network layer needs the same guard as your
net/httpclient - back-pressure — the general pattern this is a specialization of
- delivery-guarantees — why “fail-fast with 429” beats “buffer forever”
- url-intel/overview — first consumer of the pattern