rezoraith has asked for the wisdom of the Perl Monks concerning the following question:

I am but a grasshopper in the ways of the Perl and also a grasshopper in the ways of this board. This is my first time posting, so please enlighten me.
This is the problem: I currently made a song database

http://supremesovereign.netfirms.com/cgi-bin/cddata.pl

there, but when you click on add song, you can add HTML and Javascript and all that other goodness to it, is there some link to a script where there are some parsing techniques to stop this sort of rampid abuse? Any help would be MUCH appreciated.

Replies are listed 'Best First'.
(jeffa) Re: Stopping the abuse
by jeffa (Bishop) on May 30, 2002 at 05:38 UTC
    Well, the easiest way is 'literalize' the left bracket. Say that you have captured your user's post into the variable $posted_html. Simply substitute all occurances of < with &lt;
    $posted_html =~ s/</&lt;/g;
    This will disable ALL rendering of HTML tags. It also has the side effect of displaying what the user tried to submit. You could also try to strip out the tags, but this is really a fine art. What if you want to allow some tags like <b> and <u> but disable others like <a> and <script>. Your code will need to be sophisticated. Incidentally, this is what the code in Why I like functional programming addresses.

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    
      Hi i'm the author, it worked, thanks, something so simple, i'd never expected it. haha.

      That is neat, but you'd also need to s/// ascii codes like:

      \x3Ca href="#" onclick="alert('a ha')">boo\x3C/a>

      and no doubt lots of other tricks. It's generally better to strip everything out than to try and keep up with the kids, i've found.

      update: completely wrong, as jeffa was tactful enough to point out privately. the translation of the ascii character happens in perl, not in the browser. i tested with a qq|| string and didn't look at the html source. slap.

Re: Stopping the abuse
by Zaxo (Archbishop) on May 30, 2002 at 05:19 UTC

    It's not a script, it's a process. Start with perlsec and Ovid's CGI Course. Super Search here for 'security', 'taint', 'cgi', javascript, etc for particular cases.

    What particular security problems are you having?

    After Compline,
    Zaxo

Re: Stopping the abuse
by thpfft (Chaplain) on May 30, 2002 at 12:41 UTC

    my (blush) HTML::TagFilter is written with this in mind. it's just a subclass of HTML::Parser, and as always there are many other ways to do it. Many here would probably recommend building a screen of your own on HTML::TokeParser. Ymmv.

    TagFilter is very young (0.08 or so), and pulls in quite a lot of cpan with it (all the cleverness is in the Parser), but it's meant to be very easy to use. to eliminate all html from input, you would just need to do this:

    use HTML::TagFilter; my $dirty_text = ...; #something from input my $tf = HTML::TagFilter->new(); $tf->allow_tags(); my $clean_text = $tf->filter($dirty_text);

    and to allow formatting html but disallow images and links (and strip out javascript from other tags):

    use HTML::TagFilter; my $dirty_text = ...; #something from input my $tf = HTML::TagFilter->new(); $tf->deny_tags({ img=> {all => []}, a => {all => []}, }); my $clean_text = $tf->filter($dirty_text);

    (images and links are let through by default, so a couple of rules must be added to eliminate them).

    and so on. but getting this right requires a lot of attention to browser behaviour. it's amazing what range of things explorer will interpret as an instruction, for example, and i'm always trying to keep up. best to be very restrictive to begin with and only relax the restrictions with great care.

    ps. new version soon, honest.

Re: Stopping the abuse
by abstracts (Hermit) on May 30, 2002 at 14:27 UTC
    There are a number of things that you can do in addition to stripping HTML tags.
    • Limit the length of the fields to an appropriate number of chanracters so that people don't SPAM SPAM ...
    • Don't display what people submit instantly so that they don't get the instant satisfaction of SPAM SPAM SPAMing your page. You can do some moderation.
    • Require user registration (I really donno how that fits into your needs or your users').
    • If you're the only one who would add records, you can use some authentication schema (.htaccess files if you're using apache).
    Also, your script is not handling mult-line comments properly (I added one with 3 comment lines and each appeared separately). Are you using the CGI module to get your params?

    Hope this helps,,,

Re: Stopping the abuse
by Anonymous Monk on May 30, 2002 at 05:29 UTC
    My exact problem is: if you would take the time to go to my site, please, on the top there is a link to "add songs" in there you can write your own title, artist, blah blah blah, but when you do this, you can also do stuff like make the title a link to a picture and it will appear a picture in the table on the front, or you can use javascipt to make pop up windows, or call other scripts in my directory, and so on and so forth, I would like to know how to stop that. Thanks a lot.