chromedp

Go library for driving headless Chrome via the Chrome DevTools Protocol (CDP). The Go-native equivalent of Puppeteer/Playwright. No external dependencies beyond a Chromium binary; the official chromedp/headless-shell Docker base image bundles just the headless shell binary (~100MB image vs ~1GB for full Playwright + Chromium).

Used in url-intel/overview for the /v1/screenshot and /v1/pdf endpoints. Wired through a headless-browser-pool — never spawn ad-hoc contexts in handlers.

Why chromedp for this stack

Need	Why chromedp fits
Go-native renderer	No Node-to-Go bridge, no JSON-RPC layer above CDP
Tiny runtime image	`chromedp/headless-shell` is ~100MB; full Playwright + Chromium is ~1GB
Both screenshot and PDF	Single API surface (`page.CaptureScreenshot` + `page.PrintToPDF`)
Multiple concurrent contexts	`chromedp.NewContext(parentCtx)` creates an isolated tab cheaply
Lifecycle tied to `context.Context`	Cancel context → tab closes; matches Go service shutdown semantics

Trade-off: chromedp is less ergonomic than Playwright (you write more CDP-style flows), and its docs are thinner. For a public service whose main job is “navigate + capture + return”, that’s acceptable.

Core API surface

import "github.com/chromedp/chromedp"

// One root browser per process (long-lived)
allocCtx, cancelAlloc := chromedp.NewExecAllocator(context.Background(),
    chromedp.NoSandbox,
    chromedp.DisableGPU,
    chromedp.Headless,
)
rootCtx, cancelRoot := chromedp.NewContext(allocCtx)

// One context per render (cheap; reuses the root browser)
tabCtx, cancelTab := chromedp.NewContext(rootCtx)
defer cancelTab()

// Run actions
var buf []byte
err := chromedp.Run(tabCtx,
    chromedp.Navigate(url),
    chromedp.WaitReady("body"),
    chromedp.FullScreenshot(&buf, 90),
)

chromedp.Run(ctx, actions...) is the workhorse. Each action is a tasks.Action — compose them like middleware.

Screenshot patterns

// Viewport screenshot
chromedp.CaptureScreenshot(&buf)

// Full-page screenshot at quality 90
chromedp.FullScreenshot(&buf, 90)

// Custom viewport + element-only screenshot
chromedp.EmulateViewport(1280, 800),
chromedp.Screenshot("#main", &buf, chromedp.NodeVisible, chromedp.ByID),

For url-intel’s ?full_page=true&width=1280&format=png|webp:

Width sets viewport before navigate
FullScreenshot for full-page, CaptureScreenshot for viewport-only
WebP conversion: chromedp doesn’t emit WebP directly. Either pass through image/png decode → webp encode (cwebp Go binding) or use CDP’s Page.captureScreenshot { format: "webp" } directly via cdp/page.CaptureScreenshot

PDF patterns

import "github.com/chromedp/cdproto/page"

var buf []byte
err := chromedp.Run(tabCtx,
    chromedp.Navigate(url),
    chromedp.WaitReady("body"),
    chromedp.ActionFunc(func(ctx context.Context) error {
        b, _, err := page.PrintToPDF().
            WithPrintBackground(true).
            WithLandscape(landscape).
            WithPaperWidth(8.27).      // A4 inches
            WithPaperHeight(11.69).
            Do(ctx)
        buf = b
        return err
    }),
)

page.PrintToPDF exposes the full Chrome print options: paper size, margins, landscape, print background, header/footer templates.

Speed levers

Tactic	Effect
Block ad/tracker domains via `network.SetBlockedURLs`	30–50% p50 cut on typical news pages
`WaitReady` not `WaitVisible`	Skip layout-thrash waiting; “ready” is usually enough for a screenshot
`networkidle2` over `networkidle0`	Many pages have long-poll WebSockets; idle-0 never fires, idle-2 is fine
Disable images (for metadata-only paths)	Massive cut if you don’t need pixels
Pre-warm contexts on pool init	First navigate isn’t paying V8 init cost

Security: hook the network layer

The ssrf-guard in your http.Transport does nothing for chromedp — Chrome dials directly. Two options:

Network interception via network.SetRequestInterception + Fetch domain. Each request raises an event; you validate the URL and either continueRequest or failRequest.
HTTP proxy — run a tiny in-process proxy that enforces the SSRF guard, point chromedp at it (--proxy-server=127.0.0.1:NNNN).

Option 2 is simpler. Option 1 is faster (no extra hop). For url-intel v1, option 2 is the default; option 1 is a Phase 3 optimization.

Common gotchas

NoSandbox is required inside Docker unless you set up user-namespace remapping. --no-sandbox is fine for a stateless renderer because the threat is “weird sites crashing Chrome”, not “Chrome reads your filesystem” (which the Linux user constraint + container handle).
First navigation is slow if you don’t pre-warm. Always navigate to about:blank once when the context is created.
context.Canceled cascades from the parent — cancel the root browser context on shutdown and all tabs close cleanly.
Memory grows over time per context. Recycle tabs after N renders (50–100 is a good cap) — chromedp doesn’t reset V8 between navigations.
Headless shell is not full Chrome — no flash, no PDF viewer, no extensions. Fine for screenshots/PDFs of normal pages; not for “render a PDF by opening someone else’s PDF” (use pdfium or unidoc for that).
chromedp/headless-shell does not include Chrome’s font set. Pages needing CJK / emoji render with squares unless you add a system font package in the runtime stage (fonts-noto, fonts-noto-color-emoji).

Pool pattern

See headless-browser-pool for the full design. The short version: one allocator, N pre-warmed contexts in a buffered channel, bounded wait queue, per-render timeout. Don’t NewContext in a request handler.

Where else this could be used

Future product	Use
SEO/preview checker	Same render path, different post-processing (lighthouse-ish metrics)
“Render markdown to image” tool	Internal HTML template + chromedp screenshot — no extra deps
Demand-radar trend evidence	Screenshot a Trends page for the digest preview

Once url-intel’s render package exists, every weekend product that needs rendering reuses it as shared infrastructure.

Alternatives considered

Alternative	Why not for url-intel
Playwright (Go via `playwright-go`)	Larger image, slower spawn, more deps; ergonomics not enough win for stateless API
Puppeteer (Node)	Wrong language stack for this service
`rod` (Go)	Nicer API than chromedp, smaller community, less marketplace-tested
Managed services (Browserless, ScrapingBee)	We are the URL renderer — paying a margin layer doesn’t fit
`wkhtmltopdf`	Abandoned, no JS support, dead-end

headless-browser-pool — the pool pattern that wraps this
ssrf-guard — Chrome needs the guard at the network layer too
url-intel/overview — primary consumer
mini-apps/overview — render package becomes shared infra for the series