tool
chromedp
chromedp
Go library for driving headless Chrome via the Chrome DevTools Protocol (CDP).
The Go-native equivalent of Puppeteer/Playwright. No external dependencies
beyond a Chromium binary; the official chromedp/headless-shell Docker base
image bundles just the headless shell binary (~100MB image vs ~1GB for full
Playwright + Chromium).
Used in url-intel/overview for the /v1/screenshot and /v1/pdf
endpoints. Wired through a headless-browser-pool — never spawn ad-hoc
contexts in handlers.
Why chromedp for this stack
| Need | Why chromedp fits |
|---|---|
| Go-native renderer | No Node-to-Go bridge, no JSON-RPC layer above CDP |
| Tiny runtime image | chromedp/headless-shell is ~100MB; full Playwright + Chromium is ~1GB |
| Both screenshot and PDF | Single API surface (page.CaptureScreenshot + page.PrintToPDF) |
| Multiple concurrent contexts | chromedp.NewContext(parentCtx) creates an isolated tab cheaply |
Lifecycle tied to context.Context | Cancel context → tab closes; matches Go service shutdown semantics |
Trade-off: chromedp is less ergonomic than Playwright (you write more CDP-style flows), and its docs are thinner. For a public service whose main job is “navigate + capture + return”, that’s acceptable.
Core API surface
import "github.com/chromedp/chromedp"
// One root browser per process (long-lived)
allocCtx, cancelAlloc := chromedp.NewExecAllocator(context.Background(),
chromedp.NoSandbox,
chromedp.DisableGPU,
chromedp.Headless,
)
rootCtx, cancelRoot := chromedp.NewContext(allocCtx)
// One context per render (cheap; reuses the root browser)
tabCtx, cancelTab := chromedp.NewContext(rootCtx)
defer cancelTab()
// Run actions
var buf []byte
err := chromedp.Run(tabCtx,
chromedp.Navigate(url),
chromedp.WaitReady("body"),
chromedp.FullScreenshot(&buf, 90),
)
chromedp.Run(ctx, actions...) is the workhorse. Each action is a tasks.Action
— compose them like middleware.
Screenshot patterns
// Viewport screenshot
chromedp.CaptureScreenshot(&buf)
// Full-page screenshot at quality 90
chromedp.FullScreenshot(&buf, 90)
// Custom viewport + element-only screenshot
chromedp.EmulateViewport(1280, 800),
chromedp.Screenshot("#main", &buf, chromedp.NodeVisible, chromedp.ByID),
For url-intel’s ?full_page=true&width=1280&format=png|webp:
- Width sets viewport before navigate
FullScreenshotfor full-page,CaptureScreenshotfor viewport-only- WebP conversion: chromedp doesn’t emit WebP directly. Either pass through
image/pngdecode →webpencode (cwebp Go binding) or use CDP’sPage.captureScreenshot { format: "webp" }directly viacdp/page.CaptureScreenshot
PDF patterns
import "github.com/chromedp/cdproto/page"
var buf []byte
err := chromedp.Run(tabCtx,
chromedp.Navigate(url),
chromedp.WaitReady("body"),
chromedp.ActionFunc(func(ctx context.Context) error {
b, _, err := page.PrintToPDF().
WithPrintBackground(true).
WithLandscape(landscape).
WithPaperWidth(8.27). // A4 inches
WithPaperHeight(11.69).
Do(ctx)
buf = b
return err
}),
)
page.PrintToPDF exposes the full Chrome print options: paper size, margins,
landscape, print background, header/footer templates.
Speed levers
| Tactic | Effect |
|---|---|
Block ad/tracker domains via network.SetBlockedURLs | 30–50% p50 cut on typical news pages |
WaitReady not WaitVisible | Skip layout-thrash waiting; “ready” is usually enough for a screenshot |
networkidle2 over networkidle0 | Many pages have long-poll WebSockets; idle-0 never fires, idle-2 is fine |
| Disable images (for metadata-only paths) | Massive cut if you don’t need pixels |
| Pre-warm contexts on pool init | First navigate isn’t paying V8 init cost |
Security: hook the network layer
The ssrf-guard in your http.Transport does nothing for chromedp — Chrome
dials directly. Two options:
- Network interception via
network.SetRequestInterception+Fetchdomain. Each request raises an event; you validate the URL and eithercontinueRequestorfailRequest. - HTTP proxy — run a tiny in-process proxy that enforces the SSRF guard,
point chromedp at it (
--proxy-server=127.0.0.1:NNNN).
Option 2 is simpler. Option 1 is faster (no extra hop). For url-intel v1, option 2 is the default; option 1 is a Phase 3 optimization.
Common gotchas
NoSandboxis required inside Docker unless you set up user-namespace remapping.--no-sandboxis fine for a stateless renderer because the threat is “weird sites crashing Chrome”, not “Chrome reads your filesystem” (which the Linux user constraint + container handle).- First navigation is slow if you don’t pre-warm. Always navigate to
about:blankonce when the context is created. context.Canceledcascades from the parent — cancel the root browser context on shutdown and all tabs close cleanly.- Memory grows over time per context. Recycle tabs after N renders (50–100 is a good cap) — chromedp doesn’t reset V8 between navigations.
- Headless shell is not full Chrome — no flash, no PDF viewer, no
extensions. Fine for screenshots/PDFs of normal pages; not for “render a PDF
by opening someone else’s PDF” (use
pdfiumorunidocfor that). chromedp/headless-shelldoes not include Chrome’s font set. Pages needing CJK / emoji render with squares unless you add a system font package in the runtime stage (fonts-noto,fonts-noto-color-emoji).
Pool pattern
See headless-browser-pool for the full design. The short version: one
allocator, N pre-warmed contexts in a buffered channel, bounded wait queue,
per-render timeout. Don’t NewContext in a request handler.
Where else this could be used
| Future product | Use |
|---|---|
| SEO/preview checker | Same render path, different post-processing (lighthouse-ish metrics) |
| “Render markdown to image” tool | Internal HTML template + chromedp screenshot — no extra deps |
| Demand-radar trend evidence | Screenshot a Trends page for the digest preview |
Once url-intel’s render package exists, every weekend product that needs
rendering reuses it as shared infrastructure.
Alternatives considered
| Alternative | Why not for url-intel |
|---|---|
Playwright (Go via playwright-go) | Larger image, slower spawn, more deps; ergonomics not enough win for stateless API |
| Puppeteer (Node) | Wrong language stack for this service |
rod (Go) | Nicer API than chromedp, smaller community, less marketplace-tested |
| Managed services (Browserless, ScrapingBee) | We are the URL renderer — paying a margin layer doesn’t fit |
wkhtmltopdf | Abandoned, no JS support, dead-end |
Related
- headless-browser-pool — the pool pattern that wraps this
- ssrf-guard — Chrome needs the guard at the network layer too
- url-intel/overview — primary consumer
- mini-apps/overview — render package becomes shared infra for the series