URL validation — permissive, not strict
Source: markstack/src/routes/bookmarks.ts Category: Pattern — input handling
URL validation (permissive) — for user-entered URLs (bookmarks, profile links, “paste a URL here” forms), validate loosely. Check the string parses as a URL; don’t check the site responds; don’t require a specific scheme. Over-strict validation has caused every bookmark tool since Delicious to be slightly wrong.
What it is
Section titled “What it is”Minimal validation:
- String is non-empty
- String parses with
new URL()after adding a default scheme if missing - Hostname is present (not
new URL('http:///')) - Scheme is in an allowlist (http, https, ftp, possibly mailto/tel/magnet for specialized cases)
Skip: reachability check, DNS lookup, trailing-slash normalization, case folding.
Why it exists
Section titled “Why it exists”The problem: strict validation rejects valid URLs:
github.com/foo— valid as a scheme-less URL; users paste these constantlyhttps://site.with.odd-port.example:8080/deep/path?utm=true#frag— valid, just complexmagnet:?xt=urn:btih:...— valid, not httphttps://does-not-exist-right-now.example/— valid; the site might be down temporarily
Every time a URL tool rejects a valid URL, a user curses the tool.
Permissive validation accepts and stores. The worst case: a bad URL sits in your DB and produces a broken link when clicked. That’s user-visible and fixable. Blocking on submit is user-hostile and silently destroys intent.
Implementation
Section titled “Implementation”function validateURL(input: string): { ok: true; url: string } | { ok: false; error: string } { const trimmed = input.trim(); if (!trimmed) return { ok: false, error: 'required' };
// Add scheme if missing (http is the right guess; upgrade via redirects) const withScheme = /^[a-z][a-z0-9+.-]*:/i.test(trimmed) ? trimmed : `https://${trimmed}`;
try { const u = new URL(withScheme); if (!u.hostname) return { ok: false, error: 'missing hostname' };
const allowedSchemes = ['http:', 'https:', 'ftp:']; if (!allowedSchemes.includes(u.protocol)) { return { ok: false, error: `scheme ${u.protocol} not allowed` }; }
return { ok: true, url: u.toString() }; } catch { return { ok: false, error: 'not a URL' }; }}Store u.toString() — that’s the canonicalized form that URL produces.
How it’s used
Section titled “How it’s used”- markstack — POST /bookmarks runs user input through this, stores the normalized URL
- Cairn — admin’s “project link” input uses a simpler version (snippets/normalize-link)
- Pattern generalizes to any app storing user-supplied URLs
Gotchas
Section titled “Gotchas”- Don’t require http(s)://. Users paste
github.com/fooall the time. Add the scheme for them. - Don’t strip trailing slashes.
example.com/foo/andexample.com/fooare technically different URLs (depending on the server’s config). Treat them as the user typed. - Don’t case-fold paths. Hostname is case-insensitive; path is case-sensitive in HTTP. Lowercasing paths breaks legitimate URLs.
- Don’t reject based on TLD. A URL that worked yesterday can work tomorrow; a URL on an obscure TLD is still a valid URL.
- Don’t check reachability at submit time. Slow, flaky, and the right answer is usually “we don’t know yet”. Run a periodic background check for broken links if you care.
- Do encode special characters.
new URL(...).toString()percent-encodes non-ASCII and special chars consistently. Store the encoded form. - Length cap. 2KB is a reasonable upper bound for URLs — enough for very complex URLs, prevents accidental DB abuse.
- XSS by URL.
javascript:alert(1)is a valid-looking URL. The scheme allowlist rejects it before it’s rendered as anhref. data:URLs are a vector. Data URLs can be arbitrarily large and carry embedded HTML/scripts. Not in the allowlist unless you specifically need them.
See also
Section titled “See also”- projects/markstack
- patterns/pagination-cursor-vs-offset — the other “lenient defaults” pattern markstack uses