CGI::Taintless - a request for comments

The problem

In Is force_untaint in HTML::Template overkill? I asked for the help of the monastery as to whether a particular form of detainting was worthwhile. Having established that it is, it seems to be that there is a gap in current tools here. I now have a proof of concept, and I wish the further help of the monastery in weighing its worth. I am starting a new thread to improve clarity. To recap the problem, it is a natural paradigm with HTML::Template to have a template like:
<html> <head><title>test.tmpl</title></head> <body> <TMPL_VAR NAME="form"> <TMPL_VAR NAME="hidden"> <submit/> </form> </body> </html>
We will then use the CGI object to fill in the template variables. The issue is that the values are derived from the CGI input parameters and so are tainted. I think the use of HTML::Template here makes the issues slightly harder to see so instead I will work off this test script.
#!/usr/bin/perl -wT # test3a.pl use CGI; use strict; use Test::Simple tests => 11; use Scalar::Util qw(tainted); my $c = CGI->new(); pr($c->header()); pr($c->start_html()); pr("print world"); pr($c->param()); pr($c->param("blah")); pr($c->hidden(-name=>"blah")); pr($c->start_form()); pr($c->self_url()); pr($c->submit()); pr($c->end_form()); pr($c->end_html()); sub pr { my @thingy = shift; foreach my $t (@thingy) { ok(!tainted($t), $t); } }
If you now run perl -T test3a.pl it runs without error. But if you run perl -T test3a.pl blah=hello you get several errors as follows:

C:\Users\SilasTheMonk\Downloads\Documents\paranoia>perl -T test3a.pl b +lah=hello 1..11 ok 1 - Content-Type: text/html; charset=ISO-8859-1 # # ok 2 - <!DOCTYPE html # PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" # "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> # <html xmlns="http://www.w3.org/1999/xhtml" lang="en-US" xml:lang="en +-US"> # <head> # <title>Untitled Document</title> # <meta http-equiv="Content-Type" content="text/html; charset=iso-8859 +-1" /> # </head> # <body> # ok 3 - print world not ok 4 - blah # Failed test 'blah' # at test3a.pl line 23. not ok 5 - hello # Failed test 'hello' # at test3a.pl line 23. not ok 6 - <input type="hidden" name="blah" value="hello" /> # Failed test '<input type="hidden" name="blah" value="hello" />' # at test3a.pl line 23. not ok 7 - <form method="post" action="http://localhost?blah=hello" en +ctype= tipart/form-data"> # # Failed test '<form method="post" action="http://localhost?blah=hel +lo" en e="multipart/form-data"> # ' # at test3a.pl line 23. not ok 8 - http://localhost?blah=hello # Failed test 'http://localhost?blah=hello' # at test3a.pl line 23. ok 9 - <input type="submit" name=".submit" /> ok 10 - </form> ok 11 - # </body> # </html> # Looks like you failed 5 tests of 11.

What I think we should have is an easy way of detainting the outputs from the CGI module. The detainting needs to be easy and reliable so that it will actually be used. Often it will be best if it is customized to the particular web-site. The CGI::Untaint module is some help but fails in the following respects: Where the CGI::Untaint does help is that it provides a library of detainting recipes, and some of the targets are tricky so it is better to have it done once correctly.

My proposal

package CGI::Taintless; use vars qw(@ISA); sub new { my $class = shift; # inherit from a CGI hash-based object, to which we default many C +GI operations my $self = shift; # must be something conforming to CGI interface my $uclass = ref($self); @ISA = ($uclass); die "I am going in circles" if $uclass eq $class; $self->{__Taintless_taint_handlers} = shift || {}; # must be a par +am => taint handler mapping my $max_param_len = shift || 10; $self->{__Taintless_param_check} = "^\(\[\\w\\\_\]\{1\,$max_param_ +len\}\)\$"; bless $self, $class; return $self; } sub param { my $self = shift; if (scalar(@_) == 0) { # must only allow alphanumeric parameters for which we have ta +int handlers my @params = $self->SUPER::param(); my @filtered = (); foreach my $p (@params) { if ($self->get_re($p)) { push @filtered, $1 if $p =~ /$self->{__Taintless_param +_check}/; } } return @filtered; } elsif (scalar(@_) == 1) { # will be tainted my $p = shift; my $v = $self->SUPER::param($p); return undef unless defined($v); # need this line to deal with + the .cgifields parameter my $re = $self->get_re($p) || die "Cannot find taint handler f +or $p"; return $1 if $v =~ /$re/; die "$p does not pass taint check" } else { $self->SUPER::param(@_); } } sub get_re { my $self = shift; my $p = shift; if (exists $self->{__Taintless_taint_handlers}->{$p}) { return $self->{__Taintless_taint_handlers}->{$p}; } elsif (exists $self->{__Taintless_taint_handlers}->{-DEFAULT_HANDL +ER}) { return $self->{__Taintless_taint_handlers}->{-DEFAULT_HANDLER} +; } return undef; } 1

I tried this out with the following script:

#!/usr/bin/perl -wT # test3.pl use CGI; use Test::Simple tests => 11; use Scalar::Util qw(tainted); use lib qw(...........); # set this as appropriate. use CGI::Taintless; my $q = CGI->new(); my $c = CGI::Taintless->new($q, {blah=>'^([helo]+)$'}); pr($c->header()); pr($c->start_html()); pr("print world"); pr($c->param()); pr($c->param("blah")); pr($c->hidden(-name=>"blah")); pr($c->start_form()); pr($c->self_url()); pr($c->submit()); pr($c->end_form()); pr($c->end_html()); sub pr { my @thingy = shift; foreach my $t (@thingy) { ok(!tainted($t), $t); } }

Then all tests are passed.

My questions


In reply to RFC: CGI::Taintless by SilasTheMonk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.