tool

chromedp

created 2026-06-07 go · chrome · headless · browser · chromedp · rendering · screenshots · pdf

chromedp

Go library for driving headless Chrome via the Chrome DevTools Protocol (CDP). The Go-native equivalent of Puppeteer/Playwright. No external dependencies beyond a Chromium binary; the official chromedp/headless-shell Docker base image bundles just the headless shell binary (~100MB image vs ~1GB for full Playwright + Chromium).

Used in url-intel/overview for the /v1/screenshot and /v1/pdf endpoints. Wired through a headless-browser-pool — never spawn ad-hoc contexts in handlers.

Why chromedp for this stack

NeedWhy chromedp fits
Go-native rendererNo Node-to-Go bridge, no JSON-RPC layer above CDP
Tiny runtime imagechromedp/headless-shell is ~100MB; full Playwright + Chromium is ~1GB
Both screenshot and PDFSingle API surface (page.CaptureScreenshot + page.PrintToPDF)
Multiple concurrent contextschromedp.NewContext(parentCtx) creates an isolated tab cheaply
Lifecycle tied to context.ContextCancel context → tab closes; matches Go service shutdown semantics

Trade-off: chromedp is less ergonomic than Playwright (you write more CDP-style flows), and its docs are thinner. For a public service whose main job is “navigate + capture + return”, that’s acceptable.

Core API surface

import "github.com/chromedp/chromedp"

// One root browser per process (long-lived)
allocCtx, cancelAlloc := chromedp.NewExecAllocator(context.Background(),
    chromedp.NoSandbox,
    chromedp.DisableGPU,
    chromedp.Headless,
)
rootCtx, cancelRoot := chromedp.NewContext(allocCtx)

// One context per render (cheap; reuses the root browser)
tabCtx, cancelTab := chromedp.NewContext(rootCtx)
defer cancelTab()

// Run actions
var buf []byte
err := chromedp.Run(tabCtx,
    chromedp.Navigate(url),
    chromedp.WaitReady("body"),
    chromedp.FullScreenshot(&buf, 90),
)

chromedp.Run(ctx, actions...) is the workhorse. Each action is a tasks.Action — compose them like middleware.

Screenshot patterns

// Viewport screenshot
chromedp.CaptureScreenshot(&buf)

// Full-page screenshot at quality 90
chromedp.FullScreenshot(&buf, 90)

// Custom viewport + element-only screenshot
chromedp.EmulateViewport(1280, 800),
chromedp.Screenshot("#main", &buf, chromedp.NodeVisible, chromedp.ByID),

For url-intel’s ?full_page=true&width=1280&format=png|webp:

  • Width sets viewport before navigate
  • FullScreenshot for full-page, CaptureScreenshot for viewport-only
  • WebP conversion: chromedp doesn’t emit WebP directly. Either pass through image/png decode → webp encode (cwebp Go binding) or use CDP’s Page.captureScreenshot { format: "webp" } directly via cdp/page.CaptureScreenshot

PDF patterns

import "github.com/chromedp/cdproto/page"

var buf []byte
err := chromedp.Run(tabCtx,
    chromedp.Navigate(url),
    chromedp.WaitReady("body"),
    chromedp.ActionFunc(func(ctx context.Context) error {
        b, _, err := page.PrintToPDF().
            WithPrintBackground(true).
            WithLandscape(landscape).
            WithPaperWidth(8.27).      // A4 inches
            WithPaperHeight(11.69).
            Do(ctx)
        buf = b
        return err
    }),
)

page.PrintToPDF exposes the full Chrome print options: paper size, margins, landscape, print background, header/footer templates.

Speed levers

TacticEffect
Block ad/tracker domains via network.SetBlockedURLs30–50% p50 cut on typical news pages
WaitReady not WaitVisibleSkip layout-thrash waiting; “ready” is usually enough for a screenshot
networkidle2 over networkidle0Many pages have long-poll WebSockets; idle-0 never fires, idle-2 is fine
Disable images (for metadata-only paths)Massive cut if you don’t need pixels
Pre-warm contexts on pool initFirst navigate isn’t paying V8 init cost

Security: hook the network layer

The ssrf-guard in your http.Transport does nothing for chromedp — Chrome dials directly. Two options:

  1. Network interception via network.SetRequestInterception + Fetch domain. Each request raises an event; you validate the URL and either continueRequest or failRequest.
  2. HTTP proxy — run a tiny in-process proxy that enforces the SSRF guard, point chromedp at it (--proxy-server=127.0.0.1:NNNN).

Option 2 is simpler. Option 1 is faster (no extra hop). For url-intel v1, option 2 is the default; option 1 is a Phase 3 optimization.

Common gotchas

  • NoSandbox is required inside Docker unless you set up user-namespace remapping. --no-sandbox is fine for a stateless renderer because the threat is “weird sites crashing Chrome”, not “Chrome reads your filesystem” (which the Linux user constraint + container handle).
  • First navigation is slow if you don’t pre-warm. Always navigate to about:blank once when the context is created.
  • context.Canceled cascades from the parent — cancel the root browser context on shutdown and all tabs close cleanly.
  • Memory grows over time per context. Recycle tabs after N renders (50–100 is a good cap) — chromedp doesn’t reset V8 between navigations.
  • Headless shell is not full Chrome — no flash, no PDF viewer, no extensions. Fine for screenshots/PDFs of normal pages; not for “render a PDF by opening someone else’s PDF” (use pdfium or unidoc for that).
  • chromedp/headless-shell does not include Chrome’s font set. Pages needing CJK / emoji render with squares unless you add a system font package in the runtime stage (fonts-noto, fonts-noto-color-emoji).

Pool pattern

See headless-browser-pool for the full design. The short version: one allocator, N pre-warmed contexts in a buffered channel, bounded wait queue, per-render timeout. Don’t NewContext in a request handler.

Where else this could be used

Future productUse
SEO/preview checkerSame render path, different post-processing (lighthouse-ish metrics)
“Render markdown to image” toolInternal HTML template + chromedp screenshot — no extra deps
Demand-radar trend evidenceScreenshot a Trends page for the digest preview

Once url-intel’s render package exists, every weekend product that needs rendering reuses it as shared infrastructure.

Alternatives considered

AlternativeWhy not for url-intel
Playwright (Go via playwright-go)Larger image, slower spawn, more deps; ergonomics not enough win for stateless API
Puppeteer (Node)Wrong language stack for this service
rod (Go)Nicer API than chromedp, smaller community, less marketplace-tested
Managed services (Browserless, ScrapingBee)We are the URL renderer — paying a margin layer doesn’t fit
wkhtmltopdfAbandoned, no JS support, dead-end