$data =~ s/</</g;

If you're going to do that, you may just about as well do it all the way:

use HTML::Entities; $datum=encode_entities($datum);

Though I admit your solution is probably faster and yet will probably get the job done.

The problem arises when you do in fact need to allow some HTML through. Another poster suggested to decide on a list of permissible tags and strip all others. I agree with that as far as it goes, but you also want to strip certain attributes (notably, any that start with the word 'on', case-insensitive) regardless of what tag carries them, and if you're concerned about the sort of games that can cause browsers to hang, crash, or just plain not show the page, you probably also want to reject (or encode entities on) anything that doesn't meet some minimal standard of structure; it's relatively easy to check wellformedness, though if you want to allow legacy HTML4 and earlier you have to do a little more work. At minimum, though, you probably don't want to allow any tag to be closed that wasn't opened, and you almost certainly want to be sure that any table-related tags that are opened are also closed. This starts to get messy, and personally I've gone with the approach of putting the burden on the person who is submitting the HTML: if it's not wellformed, I pass it through encode_entities(), warn them that I've done so, and provide a link to an explanation of what wellformedness means and why it's useful. Because of the way browsers automatically decode entities, even in the values in form elements, they can then directly edit their content to fix it up, and if they get it wellformed on the next submission it'll go into the database as-is. Whether you can take this approach will depend somewhat on how much burden of quality you're willing to place on the people writing the HTML in question.


$;=sub{$/};@;=map{my($a,$b)=($_,$;);$;=sub{$a.$b->()}} split//,".rekcah lreP rehtona tsuJ";$\=$ ;->();print$/

In reply to encoding entities (Re: Back to acceptable untainted characters) by jonadab
in thread Back to acceptable untainted characters by bradcathey

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.