in reply to 3Re: Back to acceptable untainted characters
in thread Back to acceptable untainted characters

$data =~ s/</</g;

If you're going to do that, you may just about as well do it all the way:

use HTML::Entities; $datum=encode_entities($datum);

Though I admit your solution is probably faster and yet will probably get the job done.

The problem arises when you do in fact need to allow some HTML through. Another poster suggested to decide on a list of permissible tags and strip all others. I agree with that as far as it goes, but you also want to strip certain attributes (notably, any that start with the word 'on', case-insensitive) regardless of what tag carries them, and if you're concerned about the sort of games that can cause browsers to hang, crash, or just plain not show the page, you probably also want to reject (or encode entities on) anything that doesn't meet some minimal standard of structure; it's relatively easy to check wellformedness, though if you want to allow legacy HTML4 and earlier you have to do a little more work. At minimum, though, you probably don't want to allow any tag to be closed that wasn't opened, and you almost certainly want to be sure that any table-related tags that are opened are also closed. This starts to get messy, and personally I've gone with the approach of putting the burden on the person who is submitting the HTML: if it's not wellformed, I pass it through encode_entities(), warn them that I've done so, and provide a link to an explanation of what wellformedness means and why it's useful. Because of the way browsers automatically decode entities, even in the values in form elements, they can then directly edit their content to fix it up, and if they get it wellformed on the next submission it'll go into the database as-is. Whether you can take this approach will depend somewhat on how much burden of quality you're willing to place on the people writing the HTML in question.


$;=sub{$/};@;=map{my($a,$b)=($_,$;);$;=sub{$a.$b->()}} split//,".rekcah lreP rehtona tsuJ";$\=$ ;->();print$/