Wally Hartshorn has asked for the wisdom of the Perl Monks concerning the following question:
Recently I wrote a small Perl program to allow people to submit information that would be stored in a MySQL database. Not long after it was put into production, I was told that it wasn't accepting one person's input. I got a copy of the input, tried it out, and quickly saw what the problem was.
The input form has a text box where the submitter is allowed to enter free-form comments. In this particular case, the user was entering multiple paragraphs. The little untainting routine I had written for the program neglected to allow \r and \n in the input. So I added those to the regex as allowed characters and tested it out.
That allowed the multiple paragraphs, but the input, because it had been copied and pasted from a Microsoft Word document, contained some special characters -- single open quotes, single close quotes, emdashes, etc. My regex hadn't taken those into account. So I decided to add those to the regex as allowed characters.
As I was doing this, I figured I ought to add the other likely special characters -- copyright symbols, trademark symbols, ellipses, etc. But as I was doing this, I began to wonder:
What characters am I really supposed to be allowing/excluding?
I'm not taking their input and passing it to a system() command for execution. I'm just taking it and passing it to MySQL (actually Class::DBI) for entering into the database. I know that there are ways to exploit this type of situation to do other database commands, but I don't know how those work, so I don't know what I need to prevent.
When Perl gurus are asked "how do I untaint stuff", they generally answer with "it depends". I understand that, but it seems like there ought to be some common ways of untainting input data in common situations -- e.g. "do this before sending something to a MySQL database" and "do this before using something as an email address".
Are there any such standard methods or do I really have to reinvent the wheel (after spending time researching each particular road) every single time? Failing that, could someone at least tell me whether the following is exceptionally stupid for data that will go into a MySQL text field (doing all database access via Class::DBI)?
# Keep only the following characters # # [:print:] printable characters # \n end-of-line characters # \x85 MS Word ellipses # \x91 MS Word single opening quote # \x92 MS Word single closing quote # \x93 MS Word double opening quote # \x94 MS Word double closing quote # \x96 MS Word endash # \x97 MS Word emdash # \x99 MS Word trademark symbol # \xA7 MS Word section symbol # \xA9 MS Word copyright symbol # \xAE MS Word registered symbol $freeformtext =~ s/ [^[:print:]\n\x85\x91\x92\x93\x94\x96\x97\x99\xA7\xA9\xAE] //gx;
Thanks in advance (as I prepare to be told "it depends"). :-)
Update: Replaced code snippet with something that's actually valid (although not necessarily correct). :-)
Wally Hartshorn
(Plug: Visit JavaJunkies, PerlMonks for Java)
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Common untainting methods?
by Abigail-II (Bishop) on Nov 25, 2003 at 23:09 UTC | |
|
Re: Common untainting methods?
by tachyon (Chancellor) on Nov 25, 2003 at 23:58 UTC | |
|
Re: Common untainting methods?
by sgifford (Prior) on Nov 26, 2003 at 06:58 UTC | |
|
Re: Common untainting methods?
by Anonymous Monk on Nov 26, 2003 at 04:13 UTC | |
by sgifford (Prior) on Nov 26, 2003 at 06:39 UTC | |
by Anonymous Monk on Nov 26, 2003 at 17:06 UTC |