Recently I wrote a small Perl program to allow people to submit information that would be stored in a MySQL database. Not long after it was put into production, I was told that it wasn't accepting one person's input. I got a copy of the input, tried it out, and quickly saw what the problem was.

The input form has a text box where the submitter is allowed to enter free-form comments. In this particular case, the user was entering multiple paragraphs. The little untainting routine I had written for the program neglected to allow \r and \n in the input. So I added those to the regex as allowed characters and tested it out.

That allowed the multiple paragraphs, but the input, because it had been copied and pasted from a Microsoft Word document, contained some special characters -- single open quotes, single close quotes, emdashes, etc. My regex hadn't taken those into account. So I decided to add those to the regex as allowed characters.

As I was doing this, I figured I ought to add the other likely special characters -- copyright symbols, trademark symbols, ellipses, etc. But as I was doing this, I began to wonder:

What characters am I really supposed to be allowing/excluding?

I'm not taking their input and passing it to a system() command for execution. I'm just taking it and passing it to MySQL (actually Class::DBI) for entering into the database. I know that there are ways to exploit this type of situation to do other database commands, but I don't know how those work, so I don't know what I need to prevent.

When Perl gurus are asked "how do I untaint stuff", they generally answer with "it depends". I understand that, but it seems like there ought to be some common ways of untainting input data in common situations -- e.g. "do this before sending something to a MySQL database" and "do this before using something as an email address".

Are there any such standard methods or do I really have to reinvent the wheel (after spending time researching each particular road) every single time? Failing that, could someone at least tell me whether the following is exceptionally stupid for data that will go into a MySQL text field (doing all database access via Class::DBI)?

# Keep only the following characters # # [:print:] printable characters # \n end-of-line characters # \x85 MS Word ellipses # \x91 MS Word single opening quote # \x92 MS Word single closing quote # \x93 MS Word double opening quote # \x94 MS Word double closing quote # \x96 MS Word endash # \x97 MS Word emdash # \x99 MS Word trademark symbol # \xA7 MS Word section symbol # \xA9 MS Word copyright symbol # \xAE MS Word registered symbol $freeformtext =~ s/ [^[:print:]\n\x85\x91\x92\x93\x94\x96\x97\x99\xA7\xA9\xAE] //gx;

Thanks in advance (as I prepare to be told "it depends"). :-)

Update: Replaced code snippet with something that's actually valid (although not necessarily correct). :-)

Wally Hartshorn

(Plug: Visit JavaJunkies, PerlMonks for Java)


In reply to Common untainting methods? by Wally Hartshorn

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.