If you haven't gotten it yet,
there's wild javascript in there (it's all tame though),
read the source Luke. From what I've been saving on my pad.
Script Tag
First off the SCRIPT tag, good you stomped it.
Checking for just javascript is bad. There are
things other than javascript including:
- JScript
- blecch
- VBScript
- shudder
- PerlScript (the horror)
- This is (of course?) actually of far
greater concern here than on the internet at large
This document is less than authoritative concerning removal
of VBScript and JScript. They'd require seperate research
and I would not be surprised if there are many undocumented
features concerning them.
JavaScript Protocols
mocha: and javascript: allow inline JavaScript for anchors.
Mozilla drops support for mocha.
Example
make mine a latte
cup o' mud
For those interested, this is actually why I have JavaScript
enabled. My personal toolbar is full of this stuff, they are
called "bookmarklets".
JavaScript entities
&{}, a form of inline JavaScript, not commonly used.
This is probably NN only and it seems like support has
been dropped for this in recent 4.x builds.
Example
Entities
JavaScript Attributes
onClick, onSubmit etc. for any tag which is allowed through.
Or for extra safety as a brother has so been so kind as to
demonstrate, remove them no matter what.
HTML entities for ASCII printable characters
These should be replaced with the characters they code for,
& < and > carefully examined and
excluded of course.
NOTE: one should not be strict about requiring the ';'
as browsers are flaky on this. This should be done as the
first step of cleansing.
Examples
- Gotta love your SGML entities.
- J'accuse
This works in Netscape Communicator 4.79 and K-Meleon 0.6. UPDATE: Mozilla 1.6 too
- Variation on a theme
- J'accuse
This does not work in Netscape Communicator 4.79 or K-Meleon 0.6
I would expect it to work in some browsers.
- Chaining/Stacking I (you could of course do them in the opposite order).
- J'accuse
This does not work in Netscape Communicator 4.79 or K-Meleon 0.6
I think that if URL-encoded works in a browser then at least one chaining would follow.
- Recursion I.
- J'accuse
This does not work in Netscape Communicator 4.79 or K-Meleon 0.6
This really ought not work anywhere.
Data Protocol
No example here ;-), see the RFC below and think MIME-type.
UPDATE -- META
Something that is not itself directly a threat is
<meta http-equiv="Content-Script-Type" content="text/javascript">.
However removal of it could be prudent. If this META tag is used
to set the preferred scripting language for the page, when removed any scripts
on the page MAY become invalid (assuming the browser cannot
auto-detect the type, this is most likely for installed
extensions such as PerlScript and TCL).
Further Reading
Here are some related sources that are definitely worth a once-over
- Mmmm Entities
- RFC 2397
- Stuff you probably didn't know you could do
- Or if you really want to get in deep
(though they seem to have ditched much of the older documentation)
- More than you wanted to know
--
perl -pe "s/\b;([st])/'\1/mg"