in reply to Tag-Stripper is Insecure

The chief problem is HTML written thus: <<ILLEGAL_TAG>ILLEGAL_TAG>...<</ILLEGAL_TAG>/ILLEGAL_TAG>
I have code that converts a user-supplied chunk of text into renderable HTML. It allows a restricted subset of tags. Instead of converting the string in-place, I pick pieces off incrementally, doing something that looks like:
emit($1), next if m/\G([^<>&]+/gc; emit($1), next if m/\G(&\w+;)/gc; emit("&lt;"), next if m/\G<(?!<)/gc; # handle potentially valid REs here emit("&lt;"), next if m/\G</gc; emit("&gt;"), next if m/\G>/gc;
The first RE handles text. The second RE handles entities. The third RE prevents a sneak attack. In the case of the above fragment, it leaves   &lt;ILLEGAL_TAG&gt;...&lt;/ILLEGAL_TAG&gt; And yeah, yeah, my hands should be swatted for rolling my own HTML parsing code, but it was written several years ago before there were as many options, and it works well for what I use it for.