bradcathey has asked for the wisdom of the Perl Monks concerning the following question:

Have an application where I don't want to use -T option, but want to catch tainted or bad characters.

Which exactly are bad characters? You know, characters I don't want to allow users to enter into an HTML form that is processed by my Perl scripts? I've seen ([;<>*'$!#: but that does limit user entries in a textarea.

So, is there a definitive list? Thanks.

Edit by tye, P and CODE tags

Replies are listed 'Best First'.
Re: Tainted or bad characters
by The Mad Hatter (Priest) on Jul 28, 2003 at 18:13 UTC

    There aren't lists I know of, but the usual practice is to have a default deny...that is, strip out anything that you don't explicitly allow. So something like:

    my $text = $q->param('text'); # Removes any character that ISN'T a digit, word, or space character $text =~ s/[^\d\w\s]+//g;

    In the end though, what characters to allow/disallow all depends on how you are using the data later. Maybe you could explain this more?

      Just to expand on TMH's post. For instance passing a string containing ";" to a shell is a bad idea, but taking that same string and pushing it to a file that is used as a faq is not. tainted means different things to different outputs, imagine someone being able to push cascade deletes in a sql statment or ":" to data that is to be written in a /etc/passwd file. Perl's view of tainted data is anything that comes from the end user that is not checked to verify the string. Real Life tainted data is data that is not checked to verify "good" behavior in is destination.

      -Waswas
        This was most helpful. I was looking for a magic bullet, but , as always, it's more complicated than originally thought. To clarify, I use the data to send out as an email or store in a MySQL db for later display or manipulation. I love the quick help I get here at the monastery.
Re: Tainted or bad characters
by diotalevi (Canon) on Jul 28, 2003 at 18:27 UTC

    You should probably acquaint yourself with CGI's escapeHTML() function. I'm guessing that you intend to accept user data, store it somewhere and then later redisplay it to a web browser without allowing the original text to do "bad things". The idea is, before sending the data as HTML to the browser, encode it for HTML and that'll take all your "bad stuff" and render it harmless.

    This also raises the issue - are you storing this data somewhere? You may have similar issues that way as well - consider the common SQL insertion attack.

Re: Tainted or bad characters
by kutsu (Priest) on Jul 28, 2003 at 19:19 UTC

    It's not what you disallow (as you might miss something) it's what you allow, change [0-9...] to whatever you allow

    die "some error\n" if $foo =~ /[^0-9a-zA-Z]/;

    "Pain is weakness leaving the body, I find myself in pain everyday" -me