Reveal URLs — Architecture

This document describes how the Reveal URLs extension is structured and how its core logic works, for contributors and technical readers. It cites real files and symbols throughout (in the form path symbol) so any claim can be checked against the source. For the user-facing description see the manual; for the manual verification steps see the manual test plan.

Overview

Reveal URLs is a single MV3 WebExtension built for several engines — Chrome, Edge, Opera, Firefox and Thunderbird — from one shared, pure code core. (Safari is deferred: it has a manifest and is buildable when named explicitly, but is excluded from the default build and from CI.) The extension reveals each link's real destination next to the link in a rendered email, and flags a link whose visible text names a different registrable domain from its href — the classic phishing tell.

The repository is a pnpm/TypeScript monorepo (pnpm-workspace.yaml, package.json with "private": true). Bundling is a single esbuild-driven script, tooling/build.mjs, which compiles the shared packages and each engine's thin entry points into a loadable dist/<target>/ directory. webextension-polyfill is bundled into every script rather than relied on as a runtime global, so the same source runs unchanged across Chromium and Gecko.

Two native email add-ons extend the same detection core to surfaces the WebExtension cannot reach: an Outlook Add-in (Office.js task pane, reaching Outlook on web/Windows/Mac/iOS/Android and Outlook.com) and a Gmail Add-on (Apps Script CardService, reaching the Gmail web/Android/iOS apps). Both reuse only the PURE analysis — a new packages/core/src/findings.ts module (analyseAnchors/analyseHtml, a platform-neutral Finding[] model) that reuses the existing hostMismatch logic and tldts registrable-domain derivation. Only the host adapter (how the body is read and parsed) and the presentation differ per surface: the Outlook task pane parses the body with its own DOMParser and runs client-side; the Gmail add-on parses with node-html-parser and runs server-side on Google's Apps Script V8 infrastructure (free hosting, since Google already holds the mail; deployed local-interactively via clasp). Because neither framework can mutate the rendered, read-mode message DOM (Office's setAsync/prependAsync are compose-only; CardService renders cards, not message HTML), both add-ons present a panel/card of findings, not inline annotation — the existing linkProcessor.ts DOM-mutation path and REVEAL_URLS_CSS are NOT reused by either add-on. A relative href is resolved only against a trustworthy in-email <base href> and otherwise skipped, never against the mailbox/provider origin (which would fabricate a destination).

The codebase separates concerns sharply:

Repository layout

packages/core — the pure core

Every module here imports no extension API (browser.*/chrome.*/messenger.*) and uses no string-to-markup DOM sink (innerHTML/insertAdjacentHTML). That contract is mechanically enforced by packages/core/test/purity.test.ts, which globs every src/**/*.ts, strips comment bodies via its own stripComments (so the docblocks that legitimately NAME the forbidden tokens do not trip the guard) and then asserts that neither EXTENSION_API_PATTERN nor UNSAFE_DOM_PATTERN matches any module. The public surface is re-exported from packages/core/src/index.ts. The only runtime dependency is tldts (for public-suffix-aware domain comparison).

The modules are:

packages/webext — the browser-API wrappers

Modules here MAY use browser.* and are the only place the WebExtension APIs are touched. The public surface is re-exported from packages/webext/src/index.ts. The modules are: content.ts (content lifecycle), contentRegistration.ts (dynamic registration), storage.ts (config persistence), background.ts (on-install behaviour), toolbarAction.ts (toggle + badge), messageDisplay.ts and messageDisplayBackground.ts (Thunderbird), and options/options.ts (the settings page controller).

extensions/<engine> — thin entry points and manifests

Each engine carries its own manifest.json and icons/. Chrome, Firefox and Thunderbird additionally ship a src/ of entry points; Edge, Opera and Safari carry only a manifest and icons and REUSE Chrome's src/ (declared by the build descriptor — see below). The entries are deliberately tiny: for example extensions/chrome/src/content.ts is just void createContentController().bootstrap();, and extensions/chrome/src/background.ts calls registerBackground, registerContentReconciliation and registerToolbarToggle. Firefox's background entry is identical to Chrome's; Thunderbird's (extensions/thunderbird/src/background.ts) calls registerBackground and registerMessageDisplay instead (it has no content scripts). extensions/_template is a copy-ready Chromium scaffold, not a build target.

tooling, features, and the smoke tests

tooling holds build.mjs, version.mjs and icons.mjs. features holds the Gherkin scenarios, their step definitions and the faithful test doubles in features/support/webext.ts and features/support/world.ts. Each engine also carries a extensions/<engine>/test/manifest.smoke.test.mjs (_template's is template.smoke.test.mjs) — these are the authoritative manifest contract (see "Per-engine build & the manifest contract" below).

The reveal pipeline

The content path begins at an engine entry such as extensions/chrome/src/content.ts, which calls createContentController().bootstrap() from packages/webext/src/content.ts.

Bootstrap

createContentController bootstrap runs once per frame:

  1. It claims the frame synchronously, before any await, via claimBootstrap, which reads/sets the BOOTSTRAP_MARKER flag on the frame's window. A static built-in content_scripts entry and an overlapping dynamic user script can both inject content.js into the same frame; the shared isolated-world window makes this the correct once-only guard, and claiming before any await means two near-simultaneous injections cannot both pass it.
  2. It loads the persisted config FIRST via loadConfigWithRetry (BOOTSTRAP_CONFIG_RETRIES extra attempts on a transient rejection), holding the marker across the retries so an overlapping injection that already no-op'd is not stranded. Only once the read succeeds does it inject the stylesheet (injectStyles, which assigns REVEAL_URLS_CSS via textContent), so a failed read appends no <style> and a retry cannot duplicate it.
  3. If config load fails on every attempt, releaseBootstrap rolls the marker back (only here, before annotation starts) so a later re-injection can retry; once start has run the marker must persist, because a fresh annotator would not own the prior pass's nodes.
  4. When config.enabled it calls start, and it always registers an onConfigChanged listener that re-applies config (covering the disabled→enabled transition).

Resolving the active site rule

start (and applyConfig) calls resolveSiteRule to decide whether — and where — to annotate. resolveSiteRule first tries the running document's OWN location via resolveForHref, which filters config.sites to the enabled rules and defers to core's selectMostSpecific so the most specific match wins (a specific built-in beats a broad user wildcard). When the own location matches nothing AND the document's own origin is opaque or inherited — gated by hasOpaqueOrigin, true only for about:blank/about:srcdoc (inherited via INHERITED_ORIGIN_URLS) or blob:/data: (OPAQUE_ORIGIN_SCHEMES) — it falls back, in order, to the TOP frame (readTopHref, the built-in Proton message-iframe case), the OPENER window (readOpenerHref, the Outlook pop-out reading window opened via window.open("about:blank")) and finally document.referrer (readReferrer). Every external read is guarded against cross-origin access (which throws) and contributes a candidate only when readable. An ORDINARY unmatched document is authoritative and never inherits a rule from ambient context, leaving the controller inert.

Scoping to the content root and observing mutations

start records the matched rule's contentRoot selector, creates the annotator via createAnnotator(config), and processes each matched root from selectRoots(doc, contentRoot) (a querySelectorAll wrapped in try/catch so an invalid selector fails safe to []). It then observes doc.body (a stable container) with OBSERVER_OPTIONS ({ childList: true, subtree: true }).

The observer feeds processMutations, which gates every mutation on the content root: it re-processes a mutation target only when it is an Element within a root (isWithinRoot, via closest), and routes each added Element through processAddedNode, which annotates a node that is inside a root, IS a root, or CONTAINS one (a lazy SPA mount). So in-root churn and a sibling message body mounting later are both caught, while app chrome outside the roots is left untouched. A null annotator (the extension is stopped) is a no-op.

applyConfig reconciles a live config change: it tears down when no rule matches or the enabled flag is off; restarts (revert + re-observe) when the resolved contentRoot changed; and otherwise reflows in place — annotator.setConfig then annotator.process over each matched root — without restarting the observer.

Annotating a single anchor

The annotator lives in packages/core/src/linkProcessor.ts. createAnnotator returns an Annotator over closure state the page cannot read: a strong Map keyed by anchor identity, a per-annotator data-ru token from generateToken (crypto.getRandomValues), and the current config. Its process(root) method:

  1. Prunes registry entries whose anchor is no longer connected.
  2. Collects candidate anchors with collectAnchors (descendant a[href] plus the root itself when it is an a[href], de-duplicated).
  3. Resolves each anchor's destination with resolveAnnotatableUrl, which returns null (skip) for an empty/in-page href, an unparseable href, any scheme other than http:/https:, or a host equal to or under an ignoreHosts entry; it resolves relative hrefs against baseURI so the result is absolute.
  4. Takes the idempotent fast path when isAnnotationIntact confirms the render fingerprint is unchanged and every created node is still connected and in position; otherwise reverts the stale annotation and re-renders.
  5. Computes mismatch with hostMismatch on the clean anchor text, honours showOnlyOnMismatch (leaving honest links untouched), and calls annotateAnchor.

annotateAnchor renders SAFE-DOM only. In "inline" mode it prepends a <span class="reveal-urls-url"> whose URL is set via textContent (prefixed with an arrow and a U+00A0 non-breaking space, truncated by truncateUrl) followed by a <br>, leaving the link's own children untouched; the connected span then gets a contrast backdrop via applyContrastBackdrop. In "title" mode it saves the original title (TitleSave), overwrites it with the URL — the only write to the anchor — and, on an emphasised mismatch, appends a sibling <span class="reveal-urls-warn"> badge bearing REVEAL_URLS_WARN_LABEL. Every created node is tagged data-ru="<token>" (never the anchor), and each entry records its configFingerprint so a render-config change forces a reflow. Colours and font overrides are applied through typed element.style.* properties (applyColour/applyFontOverrides), never interpolated into a CSS string. revertOwned removes created nodes by reference and restores the saved title; revertAll does this for every owned anchor and clears the registry.

truncateUrl operates over Unicode CODE POINTS ([...displayHref]), so a cut never splits a surrogate pair; it preserves the scheme-and-host origin and shortens the rest with a trailing URL_ELLIPSIS.

Host mismatch

packages/core/src/hostMismatch.ts hostMismatch compares the link's visible text against its href by registrable domain, not raw hostname, so an honest sub-domain is not flagged while a look-alike is. extractHostCandidates splits the text and reduces each token via hostCandidate (which requires a dot and a tldts-recognised ICANN public suffix, rejecting ordinary dotted text like e.g). Both the href host and each candidate are reduced to their registrable domain with tldts.getDomain, and ANY candidate whose domain differs from the href's yields a mismatch — so naming the real malicious host alongside a lure host cannot suppress the warning. It never throws.

Styles and contrast

packages/core/src/styles.ts holds the injectable REVEAL_URLS_CSS (fixed layout, weight and shape; colours are NEVER interpolated into it) and the runtime applyColour/applyFontOverrides/applyContrastBackdrop helpers. applyContrastBackdrop reads the element's resolved colour and the first opaque ancestor background through the standard getComputedStyle (sourced from element.ownerDocument?.defaultView, so it is frame-safe and stubbable) and, when packages/core/src/contrast.ts needsWhiteBackdrop reports the text fails WCAG AA (CONTRAST_THRESHOLD) against that background AND white genuinely helps, sets style.backgroundColor = "white". contrast.ts provides parseColour, relativeLuminance and contrastRatio; it parses the rgb()/rgba()/hex forms getComputedStyle returns and treats a fully transparent colour as "not found".

Configurable hosts & content scoping

A SiteRule (packages/core/src/config.ts) says WHERE annotation runs (match, allFrames) and WHICH container scopes it (contentRoot), plus enabled and a builtin flag. The shipped DEFAULT_SITES cover Gmail, Proton (with allFrames) and both Outlook hosts. The Config aggregates the global toggles, colours, font overrides and the sites array.

The match-pattern engine

packages/core/src/matchPattern.ts holds a deliberately RESTRICTIVE add-time grammar MATCH_PATTERN (http/https only, optional leading *. host wildcard, glob path; no port, no * scheme, no <all_urls>). matchesPattern matches a URL against a validated pattern, delegating the path to pathGlobMatches (each * matches any character sequence); it fails safe to false. parsePatternParts splits a validated pattern into host/path/scheme/ wildcardHost.

Overlapping rules resolve by "most specific wins": compareSiteSpecificity ranks by (1) exact host over wildcard host, (2) longer literal host, (3) more-literal path (literalPathLength), then (4) a deterministic ASCII tiebreak; selectMostSpecific returns the most specific enabled rule matching a URL.

Granted-origin coverage is a SEPARATE, deliberately BROADER matcher. permissions.getAll() may report grants in the full WebExtension grammar (<all_urls>, *://*/*, https://*/*, *://*.host/*, https://host/*) that the restrictive MATCH_PATTERN would reject. parseGrantedOrigin parses those into GrantedOriginParts, and originCovers reports whether a granted origin covers a rule's match (e.g. a broad https://*/* grant genuinely covers every https rule). matchAllowsOriginFallback reports whether a rule's path is exactly ORIGIN_FALLBACK_PATH (/*) — the safe intersection of the Chromium and Gecko/Firefox 128 constraints on the dynamic opaque-origin fallback.

Dynamic registration

packages/webext/src/contentRegistration.ts registers one dynamic content script per user-added rule. Built-ins are served by the static content_scripts entries and are NEVER dynamically registered. desiredContentScripts filters config.sites to non-builtin, enabled rules whose origin isOriginGranted (delegating to originCovers), maps each to a RegisteredContentScript with a stable id (contentScriptId, an FNV-1a hash of the match so the id uses only the API-safe character set), and sets matchOriginAsFallback: true ONLY when matchAllowsOriginFallback(rule.match) holds (a /* path). Gating the flag this way prevents one non-conforming-path rule from causing the whole batch registration to be rejected; such a rule registers without pop-out coverage by design.

reconcileContentScriptsOnce reads the granted origins, the stored config and the live registrations, then converges by a PROPERTY-AWARE diff: sameRegistration projects both the desired descriptor and the live read-back onto a normalised form (applying each field's real WebExtension default — allFrames/matchOriginAsFallback default false, persistAcrossSessions defaults TRUE, runAt defaults document_idle) so a freshly-registered script compares EQUAL to the descriptor that produced it and no re-registration loop can arise. Because there is no updateContentScripts, a changed script is unregistered then re-registered. reconcileContentScripts wraps this with an in-flight promise (inFlight) so two near-simultaneous triggers (a config change AND permissions.onAdded both firing on a site add) are serialised rather than racing. registerContentReconciliation wires the triggers (permissions.onAdded/onRemoved, onConfigChanged) and converges once on startup; it is imported only by the Chrome and Firefox background entries.

The opt-in permission flow

The static manifests declare optional_host_permissions (http://*/*, https://*/*) so a user can grant an arbitrary additional web host at runtime. packages/webext/src/options/options.ts addSite validates the WHOLE candidate rule (match AND content-root) through normaliseSiteRule BEFORE requesting any permission, derives the host origin with matchOrigin, and calls browser.permissions.request from within the Add button's click gesture; only on grant does it append a builtin:false row and persist through the canonical write path.

Config, storage & the canonical pattern

The single validation funnel

normaliseConfig in packages/core/src/config.ts is the one canonical read path: it coerces and bounds every field, falling back to the default on anything invalid, and never throws. It is built from per-field validators — normaliseBoolean, normaliseRenderMode, normaliseMaxLength, normaliseIgnoreHosts (reducing each entry to a bare punycode hostname via bareHostname), normaliseMatchColour/normaliseMismatchColour (gated by safeColour.ts isSafeColour), normaliseCssSize, normaliseFontWeight, normaliseContentRoot (bounded by CONTENT_ROOT_PATTERN/CONTENT_ROOT_MAX_LENGTH), normaliseMatchPattern (gated by MATCH_PATTERN) and normaliseSites. normaliseSites drops rejects, de-duplicates by match, FORCES builtin: true on any rule whose match equals a built-in's (closing the exact-match-shadows-built-in / double-inject vector), re-seeds any missing built-in (so a tampered store cannot drop a core provider) and sorts alphanumerically by match for a canonical order. configFingerprint hashes only the render-affecting fields (excluding enabled, ignoreHosts and sites, which drive start/stop and skip rather than reflow) via fnv1a.

Storage

packages/webext/src/storage.ts wraps the WebExtension storage API. configArea uses browser.storage.sync when available (so settings roam) and falls back to browser.storage.local (e.g. on Thunderbird); configAreaName reports which is active. getConfig, setConfig and onConfigChanged all funnel their raw value through normaliseConfig, so a malformed or tampered store can never hand an invalid Config to a caller. onConfigChanged additionally ignores events from the INACTIVE storage area and changes to unrelated keys.

The two add-ons reuse the same read -> normaliseConfig -> return funnel over their own host stores. The Outlook task pane's extensions/outlook/src/roamingStorage.ts wraps Office.context.roamingSettings (synchronous get/set, committed with saveAsync); the Gmail add-on's packages/gmail/src/propertiesStorage.ts wraps PropertiesService.getUserProperties() (per-user, roaming across that user's devices, JSON-encoded under one key, and needing NO extra OAuth scope — never the shared ScriptProperties). Both expose getConfig/setConfig plus the card-relevant getIgnoreHosts/getHighlightMismatch accessors and guard an unavailable host surface with a dedicated error. The Gmail homepage trigger renders a CardService settings card from getConfig, and the onSaveSettings form-submit handler merges the parsed ignoreHosts/highlightMismatch onto the current config and persists it through setConfig; the contextual trigger threads ignoreHosts into the adapter and highlightMismatch into the card builder, reading defensively so a storage failure falls back to defaults rather than breaking message analysis.

The canonical-config + targeted-update pattern

Three write paths read the canonical config, change exactly one thing and write it back — never committing unsaved form edits or re-rendering the form:

The toolbar action's own toggleEnabled (packages/webext/src/toolbarAction.ts) follows the same pattern from the background side.

Per-engine build & the manifest contract

tooling/build.mjs drives a single esbuild build for every WebExtension target. The TARGETS descriptor names each target's SOURCE TARGET: Chrome, Firefox and Thunderbird ship their own src/; Edge, Opera and Safari declare chrome as their source; the Outlook (Office.js) add-in is its own source. ACTIVE_TARGETS (Chrome, Edge, Firefox, Opera and Thunderbird — Safari and Outlook are excluded) is what --all/--package build; Safari and Outlook build only when named explicitly, and the Gmail (Apps Script) add-on is a separate esbuild bundle (make build-gmail) outside the --all loop. That Gmail bundle targets Apps Script's V8 runtime, which has neither ES modules nor a native URL: it is built as ESM and then has its trailing export {…} statement stripped (so the trigger functions stay top-level globals Apps Script can invoke), and it bundles a small URL polyfill (installed only when URL is absent) so the shared core's new URL(...) resolves links server-side. buildTarget cleans dist/<target>/, copies the per-target manifest.json and icons/ verbatim, bundles each source script as IIFE (the injected INJECTED_SCRIPTS content.ts/messageDisplay.ts cannot be ES modules, and backgrounds are classic workers/event pages too) and the shared options page as ESM, and copies options.html/options.css from packages/webext/src/options/. webextension-polyfill is bundled into every script.

tooling/version.mjs stamps versions: it takes MAJOR.MINOR from the root package.json and appends an auto-incrementing BUILD number (read from the highest existing third component across the target manifests) into every target manifest, keeping them in lockstep. Most targets carry a single manifest.json; the Outlook add-in carries two versioned manifests (manifest.json and manifest.xml) and both are stamped, and the _template scaffold is stamped in lockstep too. The root package.json is kept at the same version, but stays valid semver (a four-part version is allowed in a manifest, never in package.json). tooling/icons.mjs (run via make icons) rasterises the single vector source assets/icon.svg into each target's icons/icon48.png and icons/icon128.png, trying cairosvg, then inkscape, then ImageMagick.

The manifest contract — the exact required fields per engine — is the single source of truth in each extensions/<engine>/test/manifest.smoke.test.mjs (e.g. extensions/chrome/test/manifest.smoke.test.mjs's assertManifestShape, which also builds the target and confirms every referenced file resolves in the dist). Per the project's de-duplication convention, the field-by-field shape is NOT re-enumerated here; refer to those smoke tests and to the "Manifest contract" note in the manual test plan.

Internationalisation (i18n)

Reveal URLs is localised into ten languages besides English (Danish, German, Spanish, Finnish, French, Italian, Dutch, Norwegian, Polish, Swedish). The set is single-sourced in packages/core/src/locales.json (SUPPORTED_LOCALES, native names and the default), which both the extension bundles (via the resolved JSON import) and the plain-Node website build read. Every non-English string is machine-translated and pending human review; the provenance marker is per format (the _locales use each message's description, the website chrome dictionaries record it in site/i18n/README.md, and translated doc sources carry a first-line HTML comment).

There are three independent localisation surfaces, deliberately not cross-wired, and each is split along a build-time / runtime line.

Mail-client "Active sites" suppression (AD-3)

A mail-client target (Thunderbird today) already sees every rendered message, so the per-host "Active sites" editor is redundant there and is removed. The decision is carried by a documented per-target descriptor flag, mailClient: true, on tooling/build.mjs's TARGETS.thunderbird — never a hardcoded target name in build or UI logic — and is implemented in two halves:

Security & privacy posture

Testing