Leaky URLs: Referrers, Scripts, and Unintended Disclosure

URLs are not just routing instructions. In modern web systems they function as a low-discipline data channel: easy to create, widely copied, routinely logged, and often propagated outside the context you thought you controlled.

The practical problem is not that “tracking exists” or that browsers are broken. The problem is more operational: if you place user-identifying or sensitive data in a URL path or query string, that data predictably escapes via standard browser behavior (notably the `Referer` header) and via in-page JavaScript access, including third-party scripts.

What this issue is (and is not)

This issue is often misframed as a debate about cookies, adtech, or user consent banners. Those topics overlap, but they are not the core.

The core is metadata leakage: information embedded in URLs being copied into places it was never meant to go. This includes direct identifiers (email address, phone number, and hashes of these), authentication material (reset tokens, magic links, session identifiers), and “sensitive by inference” content (search terms, record types, case categories, medical or legal topics).

One of the big problems are the prevailing misconceptions that lead developers to sometimes miss the risk entirely:

“It’s fine because it’s in the query string/path, not the page body.”
URLs are handled as metadata across the stack. They are more likely than page bodies to be logged, captured by monitoring, sent as referrers, and stored verbatim.

“HTTPS prevents others from seeing the URL.”
HTTPS protects the request in transit between endpoints. It does not prevent the browser from sharing URL data with other endpoints (via referrers or scripts), nor does it prevent logging at the endpoints, intermediaries, or tooling.

“`Referer` only sends the domain.”
That is not the default in many real deployments. Referrer behavior depends on policy and context. You have to set and verify it.

“Third-party JavaScript can’t see the page URL.”
If a script runs in your page’s origin context, it can read `window.location` like any other code. “Third-party” describes who served the script, not what privileges it has at runtime.

“We redact server logs, so we’re covered.”
Most disclosures occur before your application server ever sees the request: referrer propagation to third-party subresources, client-side analytics, tag managers, and browser-based exfiltration. Redacting *your* logs may be necessary, but it is rarely sufficient.

URLs as a data channel: what actually travels

A URL has multiple components, and they do not behave the same way:

Path and query are sent to the server in the HTTP request and routinely appear in server, CDN, and WAF logs (in “/users/123?email=a@b.com”, /users/123 is the path and ?email=a@b.com is the query) and are routinely transmitted to a wide range of third parties.
Fragment (`#token=…`) is generally not sent to the server as part of HTTP, but it is visible to JavaScript and can be captured client-side (including by third-party scripts). Single-page apps (SPAs) sometimes put state in fragments; that reduces some server-side leakage but does not eliminate client-side leakage.

The operational takeaway is simple: assume anything in the URL will be widely copied, either by deliberate instrumentation or by default plumbing.

Mechanism 1: How referrers leak full URLs

When a browser makes a request, it often includes a `Referer` header indicating the URL of the page that initiated the request. This occurs in two common cases:

Navigation: the user clicks a link from Page A to Page B. Page B may receive a referrer telling it where the user came from.
Subresource requests: Page A loads an image, script, stylesheet, font, iframe, or beacon from another endpoint. That endpoint may receive a referrer revealing the full URL of Page A.

This is not exotic behavior. It is core web functionality and widely relied on for debugging, analytics, attribution, and security controls. What portion of the URL is sent depends on the site’s Referrer Policy, which can be set via the `Referrer-Policy` HTTP header, a meta tag, or link attributes. Common policies include:

`no-referrer` (send nothing)
`origin` (send only scheme + host + port)
`strict-origin` and `strict-origin-when-cross-origin` (send more within origin; reduce detail cross-origin, especially on downgrades)

Mechanism 2: JavaScript can read the URL—and send it elsewhere

A separate mechanism is more direct: JavaScript running in the page can read the full URL, including path and query, using standard APIs:

`window.location.href`
`window.location.search`
`window.location.pathname`
`document.URL`
`new URL(window.location.href).searchParams.get(“token”)`

If the script can read it, it can transmit it. Exfiltration does not require sophisticated techniques; normal web APIs suffice.

The governance wrinkle is that many organizations treat “third-party scripts” as if they were isolated. Often they are not.

A script served by a vendor but executed in your origin inherits your origin’s privileges. It can read the URL and DOM (the browser’s live object tree of the page) unless you have deliberately constrained it.
A widget embedded as a cross-origin iframe is meaningfully different. Same-Origin Policy generally prevents that iframe from reading the parent page’s URL or DOM. That architectural distinction is one of the few reliable technical levers that developers have.

This is why “we trust the vendor” becomes irrelevant. The technical fact is that embedding vendor code directly into your origin expands the number of parties and systems capable of accessing URL-contained data—whether intended or not.

Why URL leakage becomes a legal and compliance issue

Most privacy and security regimes converge on a few baseline concepts: data minimization, purpose limitation, appropriate security controls, and accountability for disclosures to third parties.

URL leakage intersects those concepts in several ways.

URLs routinely contain personal data (or become personal data in context)

An email in the path (`/users/jane.doe@example.com`) is obviously identifying. But hashes and encoded data may be sent as well, and are less obvious to the naked eye.

Search terms and record descriptors can also be sensitive by inference. A query like `?q=bankruptcy+lawyer` may not name a person, but in many contexts it is still personal data because it relates to an identifiable user/session and reveals something about them.

Referrer propagation and collection by scripts can turn “internal metadata” into a third-party disclosure

Even when the vendor is “trusted,” disclosures can still create compliance obligations: updating records of disclosures, specific consents may be required depending on context, ensuring appropriate contractual controls, honoring data subject rights where applicable, and aligning processing with stated purposes.

Secrets in URLs create security obligations that don’t map neatly to “PII”

Reset tokens, magic links, and session identifiers are not always “personal data” in the privacy sense, but they are sensitive security material. Once logged broadly, they can create account takeover risk and incident-response complexity.

A frequent failure mode is treating these as “temporary” and therefore harmless. In practice, temporary secrets become durable when stored in logs with long retention.

Controls that reduce risk (and what they cost)

Risk reduction is mostly determined upstream: what you choose to put in URLs, and what code you allow to execute in your origin. Headers and redaction help, but they are compensating controls.

Design rule: keep sensitive data out of URLs by default

This is the highest-leverage control.

Do not place emails, names, account numbers, case descriptors, or health/legal terms in query strings or path segments.
Avoid tokens or session identifiers in URLs. Prefer cookies or headers with appropriate flags and lifetimes.
If you need an identifier in a URL, make it opaque and non-derivable, and confirm that downstream systems do not treat it as a join key for unrelated purposes.

You will still leak *some* metadata (e.g., which endpoint was visited), but you reduce the impact materially.

Referrer controls: set policy explicitly and test it

Set `Referrer-Policy` intentionally based on functional requirements. Many organizations can use a strict default and carve out exceptions where needed.

Also consider link-level controls (`rel=”noreferrer”`) for high-risk outbound flows. Do not assume these are globally applied; verify them in rendered HTML and in real browser traffic.

Third-party script governance: reduce privileges, not just vendors

Inventory and minimize third-party JavaScript that runs in your origin. Where feasible:

Prefer server-side integrations for analytics/telemetry that do not require exposing full URLs client-side.
Use CSP to constrain where scripts can load from and where data can be sent (`script-src`, `connect-src`, `img-src`). CSP will not stop a script from reading `window.location`, but it can limit where that data can be transmitted.
For higher-risk functionality, use cross-origin sandboxed iframes so the vendor does not execute in your origin context.

Treat tag managers an integral part of your compliance program to enforce optouts and manage change in the system. They are effectively a runtime code deployment mechanism. Google Tag Manager is by far the most common and has powerful capabilities for deploying opt outs and shaping data transmissions.

Detection and assurance: preventing regressions

Because this risk is created by ordinary engineering changes (a new query parameter, a new vendor script, a redirect), one-time remediation decays quickly.

Practical assurance mechanisms include:

Automated static scanning of codebases and templates for URL construction patterns and forbidden parameters.
Network traffic analysis that inspects outbound requests and confirm referrer behavior and detect URL leakage.
Log scanning for high-risk patterns (emails, tokens, known identifiers) across ingress logs, CDN logs, and analytics payloads.

Expect false positives and gaps. The point is not perfect detection; but instead early warning when governance assumptions diverge from runtime behavior.

Observations that hold up under scrutiny

URLs should be treated as “public-ish metadata,” not a private data store.

If sensitive data in a URL would create a problem when copied into a log file, it does not belong in the URL.

Referrer leakage is not a bug and it isn’t wiretap; it is the web working as designed.

The correct response is explicit referrer policy and careful URL design, not surprise.

Third-party scripts are a privilege decision, not just a procurement decision.

If a vendor’s script runs in your origin, it can generally read the URL. Contracts do not change that runtime fact.

The most useful organizational question is not “is this compliant?” but “where will this URL end up?”

If you cannot answer that across referrers, scripts, logs, analytics, and vendor endpoints, you are operating on assumptions. Assumptions are exactly what this failure mode exploits.

Note: This piece was developed through an iterative human–AI workflow: the arguments, framing, and conclusions are the author’s, with AI assistance used for structured analysis and refinement.