CGI::Taintless - a request for comments

The problem

In Is force_untaint in HTML::Template overkill? I asked for the help of the monastery as to whether a particular form of detainting was worthwhile. Having established that it is, it seems to be that there is a gap in current tools here. I now have a proof of concept, and I wish the further help of the monastery in weighing its worth. I am starting a new thread to improve clarity. To recap the problem, it is a natural paradigm with HTML::Template to have a template like:
<html> <head><title>test.tmpl</title></head> <body> <TMPL_VAR NAME="form"> <TMPL_VAR NAME="hidden"> <submit/> </form> </body> </html>
We will then use the CGI object to fill in the template variables. The issue is that the values are derived from the CGI input parameters and so are tainted. I think the use of HTML::Template here makes the issues slightly harder to see so instead I will work off this test script.
#!/usr/bin/perl -wT # test3a.pl use CGI; use strict; use Test::Simple tests => 11; use Scalar::Util qw(tainted); my $c = CGI->new(); pr($c->header()); pr($c->start_html()); pr("print world"); pr($c->param()); pr($c->param("blah")); pr($c->hidden(-name=>"blah")); pr($c->start_form()); pr($c->self_url()); pr($c->submit()); pr($c->end_form()); pr($c->end_html()); sub pr { my @thingy = shift; foreach my $t (@thingy) { ok(!tainted($t), $t); } }
If you now run perl -T test3a.pl it runs without error. But if you run perl -T test3a.pl blah=hello you get several errors as follows:

C:\Users\SilasTheMonk\Downloads\Documents\paranoia>perl -T test3a.pl b +lah=hello 1..11 ok 1 - Content-Type: text/html; charset=ISO-8859-1 # # ok 2 - <!DOCTYPE html # PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" # "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> # <html xmlns="http://www.w3.org/1999/xhtml" lang="en-US" xml:lang="en +-US"> # <head> # <title>Untitled Document</title> # <meta http-equiv="Content-Type" content="text/html; charset=iso-8859 +-1" /> # </head> # <body> # ok 3 - print world not ok 4 - blah # Failed test 'blah' # at test3a.pl line 23. not ok 5 - hello # Failed test 'hello' # at test3a.pl line 23. not ok 6 - <input type="hidden" name="blah" value="hello" /> # Failed test '<input type="hidden" name="blah" value="hello" />' # at test3a.pl line 23. not ok 7 - <form method="post" action="http://localhost?blah=hello" en +ctype= tipart/form-data"> # # Failed test '<form method="post" action="http://localhost?blah=hel +lo" en e="multipart/form-data"> # ' # at test3a.pl line 23. not ok 8 - http://localhost?blah=hello # Failed test 'http://localhost?blah=hello' # at test3a.pl line 23. ok 9 - <input type="submit" name=".submit" /> ok 10 - </form> ok 11 - # </body> # </html> # Looks like you failed 5 tests of 11.

What I think we should have is an easy way of detainting the outputs from the CGI module. The detainting needs to be easy and reliable so that it will actually be used. Often it will be best if it is customized to the particular web-site. The CGI::Untaint module is some help but fails in the following respects: Where the CGI::Untaint does help is that it provides a library of detainting recipes, and some of the targets are tricky so it is better to have it done once correctly.

My proposal

package CGI::Taintless; use vars qw(@ISA); sub new { my $class = shift; # inherit from a CGI hash-based object, to which we default many C +GI operations my $self = shift; # must be something conforming to CGI interface my $uclass = ref($self); @ISA = ($uclass); die "I am going in circles" if $uclass eq $class; $self->{__Taintless_taint_handlers} = shift || {}; # must be a par +am => taint handler mapping my $max_param_len = shift || 10; $self->{__Taintless_param_check} = "^\(\[\\w\\\_\]\{1\,$max_param_ +len\}\)\$"; bless $self, $class; return $self; } sub param { my $self = shift; if (scalar(@_) == 0) { # must only allow alphanumeric parameters for which we have ta +int handlers my @params = $self->SUPER::param(); my @filtered = (); foreach my $p (@params) { if ($self->get_re($p)) { push @filtered, $1 if $p =~ /$self->{__Taintless_param +_check}/; } } return @filtered; } elsif (scalar(@_) == 1) { # will be tainted my $p = shift; my $v = $self->SUPER::param($p); return undef unless defined($v); # need this line to deal with + the .cgifields parameter my $re = $self->get_re($p) || die "Cannot find taint handler f +or $p"; return $1 if $v =~ /$re/; die "$p does not pass taint check" } else { $self->SUPER::param(@_); } } sub get_re { my $self = shift; my $p = shift; if (exists $self->{__Taintless_taint_handlers}->{$p}) { return $self->{__Taintless_taint_handlers}->{$p}; } elsif (exists $self->{__Taintless_taint_handlers}->{-DEFAULT_HANDL +ER}) { return $self->{__Taintless_taint_handlers}->{-DEFAULT_HANDLER} +; } return undef; } 1

I tried this out with the following script:

#!/usr/bin/perl -wT # test3.pl use CGI; use Test::Simple tests => 11; use Scalar::Util qw(tainted); use lib qw(...........); # set this as appropriate. use CGI::Taintless; my $q = CGI->new(); my $c = CGI::Taintless->new($q, {blah=>'^([helo]+)$'}); pr($c->header()); pr($c->start_html()); pr("print world"); pr($c->param()); pr($c->param("blah")); pr($c->hidden(-name=>"blah")); pr($c->start_form()); pr($c->self_url()); pr($c->submit()); pr($c->end_form()); pr($c->end_html()); sub pr { my @thingy = shift; foreach my $t (@thingy) { ok(!tainted($t), $t); } }

Then all tests are passed.

My questions

Replies are listed 'Best First'.
Re: RFC: CGI::Taintless
by moritz (Cardinal) on Sep 23, 2008 at 07:35 UTC
    $self->{__Taintless_param_check} = "^\(\[\\w\\\_\]\{1\,$max_param_len\}\)\$";

    I think it's hard to get the number of backslashes right. Maybe you want to use qr{...} instead? Also $ in a regex allows a trailing newline before the end of the string - is this what you want? If not, use \z instead.

    Somehow I think that CGI::Taintless isn't a very good name at all. Your focus should be on validating CGI params, with the side effect of untainting them. Maybe something along the lines of CGI::RegexValidate might be more appropriate as a name?

    Note that passing unvalidated CGI params to the output is a security risk, because it allows cross-site-scripting - unless you escape it. So the tests in your sample program fail for a good reason.

    Usually I take the approach of trying to validate as little as possible, because I don't know how to validate. If there's a field for the real name of a user, how do you validate that? And if you do it, are you sure that all possible names in all human languages (of which there are quite many) are actually accepted? How do you test that?

    Instead I use placeholders for my DB queries, and set the default_escape option of HTML::Template (or HTML::Template::Compiled) to html, and care as little as possible about the contents of the fields.

    So for me this module makes only very little sense, but of course I can't speak for everyone, and I can well imagine that other perl hackers might find it useful.

      Somehow I think that CGI::Taintless isn't a very good name at all. Your focus should be on validating CGI params, with the side effect of untainting them. Maybe something along the lines of CGI::RegexValidate might be more appropriate as a name?

      I personally believe that validating with regexen is fine and all, but the OP may want to extend his module to a more generic validating one allowing e.g. subs, arrayrefs (possibly to be matched against ~~) and whatever. Then I do know that actual untainting is performed by means of regexen, but then indeed it could be done so anyway, if those other "objects" match. Then he could name his module CGI::Validate::Untaint.

      PS: [OT] I lost track of the ~~'s return value in list context issue, which you brought up to p5p: how did it end?

      --
      If you can't understand the incipit, then please check the IPB Campaign.
        I lost track of the ~~'s return value in list context issue, which you brought up to p5p: how did it end?

        The answer in short was "we don't know if it's a bug or a feature because we don't know what ~~ should really do". (Not literally, though).

        The fine p5p hackers long wanted to review smartmatch semantics, and this question encouraged them to actually do it. aristotle is now investigating how the Perl 6 smartmatch semantics changed since the perl 5 version was implemented, and plans to give p5p some feedback. Then they (or we) will try to decide what the best semantics for Perl 5 are.

Re: RFC: CGI::Taintless
by Anonymous Monk on Sep 23, 2008 at 09:18 UTC
      Lots to think about there. Thanks guys.

      Update 12 Oct 2008

      I have been busy with other things recently - and those busy things are making this issue ever more pressing.

      I can find no module which allows me to choose cheaply between leveraging the wisdom of others and customizing something to my special circumstances, and also functions as a drop in replacement for CGI.pm. As such I think I am very likely to take this further.