SSRF Guard

A Server-Side Request Forgery guard is the layer that sits between user-supplied URLs and any outbound HTTP your service makes. Without it, customers can probe your internal network using your own API as a proxy: ask for http://10.0.0.5/admin or http://169.254.169.254/latest/meta-data/iam/ and your service happily fetches them — from inside your VPC.

This is the non-negotiable layer in url-intel/overview|url-intel (and any future product that fetches user URLs).

The threat model

A user supplies a URL. Your service:

Resolves the hostname → IP
Opens a TCP connection
Sends HTTP, follows redirects
Returns the body (or a screenshot of it) to the user

Each step is an attack surface:

Step	Attack
Hostname → IP	DNS rebinding: TTL=0 record resolves to `1.2.3.4` during validation, then to `127.0.0.1` on the actual dial
IP dial	Direct private IP (`10.0.0.5`), loopback (`127.x`), link-local (`169.254.x`), IPv6 ULA (`fc00::/7`), cloud metadata (`169.254.169.254`, `fd00:ec2::254`)
Redirects	Public URL → 302 → private URL; only validating the first hop misses this
Schemes	`file:///etc/passwd`, `data:`, `gopher://` — used for in-process file reads and protocol smuggling

The classic prize: AWS/GCP/Azure metadata at 169.254.169.254 returns IAM credentials with whatever role the instance has. One unguarded URL fetch = full cloud account takeover.

The guard, in layers

1. Pre-flight URL validation

✓ Parse with net/url
✓ Scheme ∈ {http, https}
✓ No userinfo (no `user:pass@` smuggling)
✓ Host present, no IDN homograph weirdness
✓ Port (if present) ∈ allowlist or in normal {80,443,...}

If any check fails, return 400 without ever resolving the hostname.

2. Custom dialer that re-resolves DNS and checks the IP

The standard http.Transport will not protect you. Replace its DialContext with one that:

DialContext(ctx, network, addr) error {
    host, port := SplitHostPort(addr)
    ips := net.LookupIP(host)
    for ip in ips:
        if isBlocked(ip):
            return ErrBlockedAddress
    return net.Dialer{}.DialContext(ctx, network, JoinHostPort(ips[0], port))
}

Pin the dial to the specific IP you just validated — don’t let net.Dial re-resolve and pick a different one (the DNS rebinding window).

3. Block list (treat conservatively — allow nothing private)

Range	What it is
`127.0.0.0/8`, `::1/128`	Loopback
`10.0.0.0/8`	RFC1918 private
`172.16.0.0/12`	RFC1918 private
`192.168.0.0/16`	RFC1918 private
`100.64.0.0/10`	Carrier-grade NAT
`169.254.0.0/16`, `fe80::/10`	Link-local (includes AWS/GCP/Azure metadata)
`fd00:ec2::/32`	AWS IPv6 metadata
`fc00::/7`	IPv6 Unique Local Address
`::ffff:0:0/96`	IPv4-mapped IPv6 (catch `::ffff:10.0.0.5`)
`0.0.0.0/8`, `::/128`	”this network” / unspecified
`224.0.0.0/4`, `ff00::/8`	Multicast
`240.0.0.0/4`	Reserved future use

For IPv6: also collapse to canonical form before checking. ::ffff:127.0.0.1 is loopback wearing an IPv6 hat.

4. Re-validate every redirect

net/http’s CheckRedirect runs before each follow. Re-validate the destination URL through the same pre-flight + dialer guard. Cap total redirects (5 is plenty). The dial-time IP check catches anything the URL-level check misses.

5. Bounded resource limits

Per-request budget: total timeout (30s), max response bytes (cap before streaming further), max redirect count (5). SSRF often pairs with denial-of-service: a 100MB response to a tiny request is still an attack even if the target is public.

Why “just block private ranges” isn’t enough

DNS rebinding defeats URL-level checks if you don’t pin the dial IP.
IPv4-in-IPv6 (::ffff:10.0.0.5) is private — make sure your check sees it.
CNAME tricks: evil.com CNAME → internal.svc.cluster.local. Re-resolution catches this only if you check the resolved IP, not the hostname.
Redirect chains: public → public → public → private. If you only check the first hop, you’ve lost.
Cloud metadata is link-local, not RFC1918. Many homemade guards miss it because they only ban 10/8 + 192.168.

Test matrix (mandatory in url-intel)

safeurl is the one module that must have thorough table-driven tests:

loopback v4              http://127.0.0.1/         → reject
loopback v6              http://[::1]/             → reject
RFC1918 10/8             http://10.0.0.5/          → reject
RFC1918 172.16/12        http://172.20.1.1/        → reject
RFC1918 192.168/16       http://192.168.1.1/       → reject
link-local v4            http://169.254.1.1/       → reject
AWS metadata             http://169.254.169.254/   → reject
IPv6 ULA                 http://[fc00::1]/         → reject
IPv4-in-IPv6 loopback    http://[::ffff:127.0.0.1] → reject
file scheme              file:///etc/passwd        → reject
data scheme              data:text/plain,foo       → reject
gopher scheme            gopher://example.com/     → reject
redirect → private       302 to http://10.0.0.5    → reject
redirect chain           5 hops, last is private   → reject
DNS rebind               TTL=0 host flips after first lookup → reject (pin IP)
unicode host             http://еxample.com/       → 400 or normalize
huge response            10GB stream               → cap + reject
slow loris               1 byte/sec                → timeout

Implementation in Go (sketch)

func NewSafeTransport() *http.Transport {
    return &http.Transport{
        DialContext:           guardedDialContext,
        TLSHandshakeTimeout:   5 * time.Second,
        ResponseHeaderTimeout: 10 * time.Second,
        DisableKeepAlives:     true, // simpler reasoning per request
    }
}

func NewSafeClient(timeout time.Duration) *http.Client {
    return &http.Client{
        Transport: NewSafeTransport(),
        Timeout:   timeout,
        CheckRedirect: func(req *http.Request, via []*http.Request) error {
            if len(via) >= 5 {
                return errors.New("safeurl: too many redirects")
            }
            return Validate(req.URL) // same pre-flight as the entry URL
        },
    }
}

chromedp (headless Chrome) needs its own equivalent: hook into the network domain via network.SetRequestInterception or set an HTTP proxy that itself enforces the guard. Letting Chrome dial freely defeats the whole layer.

When this matters

Any service that fetches user-supplied URLs:

URL preview / link-unfurl services (Slack, Discord, …)
Screenshot / PDF / scraping APIs (this is url-intel/overview)
OG-image fetchers for social cards
Webhook receivers that follow callback URLs
Importers (RSS readers, blog migrators)
AI agents with a “fetch URL” tool — easily missed because it’s “just a tool”

It does not matter if you only fetch URLs you control. The instant a user can influence the URL, you need the guard.

url-intel/overview — the canonical implementation site
headless-browser-pool — pairs with this: the renderer needs guarded dials too
marketplace-distribution — when you sell URL-fetching APIs, the guard is the product’s licence to operate; one news article about your service being used for internal probing and you’re delisted