Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

I've a web form that accepts text input (like a message) from the user. The input will be saved into a database for retrieval as output in a web format as and when it's required. Which of the following characters need special treatment so that they would pose any security risk?

; , ? ! :' \ - * _ ^ ( ) % @ $ " # = < > & [ ]
What's right way to handle data that's destined for a sql database and which will be accessed later for displaying on a web page?

Thank you so much.

Replies are listed 'Best First'.
Re: Perl cgi question
by tachyon (Chancellor) on Apr 11, 2004 at 12:01 UTC

    You (your code) poses the security risk, not the characters per se :-) By that I mean this. If all you do is insert data into a database, then retrieve and display it then the MAJOR issues are quoting the data in the SQL on insertion - handled by DBI quote method or (better) placeholders. On the display side you need to escape < > & " chars as well as deal with whitespace/newlines.

    Most of those chars are really only dangerous when passed to a shell. You forgot "\000" which is the embodiment of evil. Use #!/usr/bin/perl -wT to set taint mode and perl will warn you if you are doing anything it thinks dangerous. See also perlsec.

    Other issues are what do you want to store? If this is user input do you just want TEXT or are you going to allow HTML. If you are allowing HTML what are you going to do about JAVASCRIPT? If you are going to filter the time to do it is once on insertion rather than every time on display.

    There is a wealth of data dealing with this on this site. Super Search for 'db placeholders' and 'escape html' and 'html to text' for lots of useful threads.

    Taking a random stab it seems like you are perhaps considering writing a system where users can post data, that gets stored in a DB and then displayed. Dare I say there are 101+ implmentations of this concept on the web. You are for example looking at one right now. You might be better modifying an existing solution, perhaps even a Wiki than rolling it all yourself.

    cheers

    tachyon

      Great thanks, tachyon!

      What is "\000" and how does it pose a security risk? Do you filter it out like this:
      $str = '\000'; $str =~ s/\000//g;

        \000 0x00 \0 00000000b or %00 url encoded is a null byte. It is the string terminator in C and can have interesting effects in certain circumstances. Google for perl null byte hack or similar. Your substitution will do the job, although tr/\0//d is faster FWIW.

        cheers

        tachyon

Re: Perl cgi question
by bradcathey (Prior) on Apr 11, 2004 at 13:08 UTC
    tachyon, as usual, is right on. I have learned, from hanging out at the monastery, that it is indeed what you DO with what you get from the user that is key. I asked a similar question back in July, as a novice.

    Just to wet your appetite, here's an example of using placeholders when INSERTing or UPDATEing a MySQL db (and yes, you will find many examples with a Super Search).
    $sth = $dbh->prepare ("INSERT INTO testimonial VALUES(?,?,?,?,?)") or die "prepare: $stmt: $DBI::errstr"; $sth->execute ('',$name, $email, $testimonial,'') or die "execute: $stmt: $DBI::errstr";

    —Brad
    "A little yeast leavens the whole dough."
Re: Perl cgi question
by cLive ;-) (Prior) on Apr 11, 2004 at 12:40 UTC
    At some point - either before you store it, or when you retrieve it, run it through escapeHTML in CGI.pm
    #!/usr/bin/perl use strict; use warnings; use CGI; my $q = CGI->new(); my $flaky_input = $q->param('some_textfield'); my $safe_to_output = $q->escapeHTML($flaky_input);
    cLive ;-)
Re: Perl cgi question
by eXile (Priest) on Apr 11, 2004 at 16:17 UTC
    Hi,
    in general it is better to specifically indicate what you want to accept, then to deny only specific characters and accept everything else.

    Silly example: Suppose tomorrow somebody discovers a security hole in the webserver that you are using, that allows him/her to do all sort of evil things if a page that is displayed contains the BELL character (ascii char 007).
    Normally nobody would need to fill in webforms with this character, but your webserver will be vunerable and people will try to exploit this vulnerability (since you are only denying characters in the list you gave us and didn't thing of the BELL character).
    In case you'd only allowed [\w-\s] for instance, your webserver would not be vulnerable.