in reply to Back to acceptable untainted characters

Don't untaint all your data. That defeats the purpose of running in taint mode. Leave the data tainted, unless taint mode stops you from doing something you need to do, and then just untaint (carefully) the ones you need to use that way. In fact, if you use a regex to parse fields out of something, you should mark the extracted fields as tainted unless your regex was carefully constructed to make sure they're safe. The whole point of Taint mode is to alert you when you're doing something potentially unsafe. At that point, you want to check the datum you're doing it with specifically in terms of the operation you're performing, to make it safe for that. For example, if you're doing a system call that will be interpreted by a shell, you want to strip shell metacharacters. But you don't need to strip shell metacharacters when you send an email.

MySQL can store anything safely, if you use ? and pass in the value in the execute() call. However, you need to think about what you're going to do with the data when they come out of MySQL. If you don't check them before you put them in, you mark them as tainted when you take them out.

As far as content going to the browser: decide whether its plain text or HTML. If it's text, just encode the entities and have done. This is easy (there is a module for it on CPAN) and as safe as is necessary for ordinary purposes. If it's HTML, you'll want to check it for certain dangerous things, like scripts, and personally I also like to minimally parse it (basically just check for wellformedness), and if it's not wellformed revert to treating it like plain text (i.e., encode entities). This will annoy people who like to write old-style HTML with <p> tags between (instead of around) paragraphs, but it will also prevent any number of easy-to-make stupid mistakes, like forgetting to close off a table (which causes huge problems for older browsers).

For email: if you're sending as text/plain, which you should be, I wouldn't worry about it too much. There are tricks that can be played to make Outlook think something is an attachment even though the headers don't say so, but people who use Outlook are going to get viruses regularly anyway, so don't sweat it.


$;=sub{$/};@;=map{my($a,$b)=($_,$;);$;=sub{$a.$b->()}} split//,".rekcah lreP rehtona tsuJ";$\=$ ;->();print$/

Replies are listed 'Best First'.
Re: Re: Back to acceptable untainted characters
by bunnyman (Hermit) on Sep 08, 2003 at 19:22 UTC

    In fact, if you use a regex to parse fields out of something, you should mark the extracted fields as tainted unless your regex was carefully constructed to make sure they're safe.

    How does one mark a variable as tainted? I did not realize the program had any way to control it directly.

      How does one mark a variable as tainted?
      use Taint (); Taint::taint($untrustedvalue);

      For example, if you use a regex to parse the key-value fields out of a query string and reverse the CGI encoding, you should mark the resulting data as tainted. (The "use CGI or die" advocates will tell you that you shouldn't be writing your own function for that anyway, but hat's another debate for another thread.)


      $;=sub{$/};@;=map{my($a,$b)=($_,$;);$;=sub{$a.$b->()}} split//,".rekcah lreP rehtona tsuJ";$\=$ ;->();print$/

        That is not a standard module. It is not pure-Perl code either. (It uses XS to reach into the interpreter guts and set the tainted bit.)