CGI::Taintless - a request for comments

The problem

In Is force_untaint in HTML::Template overkill? I asked for the help of the monastery as to whether a particular form of detainting was worthwhile. Having established that it is, it seems to be that there is a gap in current tools here. I now have a proof of concept, and I wish the further help of the monastery in weighing its worth. I am starting a new thread to improve clarity. To recap the problem, it is a natural paradigm with HTML::Template to have a template like:

    <html>
        <head><title>test.tmpl</title></head>
        <body>
            <TMPL_VAR NAME="form">
            <TMPL_VAR NAME="hidden">
            <submit/>
            </form>
        </body>
    </html>
[download]

We will then use the CGI object to fill in the template variables. The issue is that the values are derived from the CGI input parameters and so are tainted. I think the use of HTML::Template here makes the issues slightly harder to see so instead I will work off this test script.

#!/usr/bin/perl -wT
# test3a.pl
use CGI;
use strict;
use Test::Simple tests => 11;
use Scalar::Util qw(tainted);
my $c = CGI->new();

pr($c->header());
pr($c->start_html());
pr("print world");
pr($c->param());
pr($c->param("blah"));
pr($c->hidden(-name=>"blah"));
pr($c->start_form());
pr($c->self_url());
pr($c->submit());
pr($c->end_form());
pr($c->end_html());

sub pr {
    my @thingy = shift;
    foreach my $t (@thingy) {
        ok(!tainted($t), $t);
    }
}
[download]

If you now run perl -T test3a.pl it runs without error. But if you run perl -T test3a.pl blah=hello you get several errors as follows:

C:\Users\SilasTheMonk\Downloads\Documents\paranoia>perl -T test3a.pl b
+lah=hello
1..11
ok 1 - Content-Type: text/html; charset=ISO-8859-1
#
#
ok 2 - <!DOCTYPE html
#       PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
#        "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
# <html xmlns="http://www.w3.org/1999/xhtml" lang="en-US" xml:lang="en
+-US">
# <head>
# <title>Untitled Document</title>
# <meta http-equiv="Content-Type" content="text/html; charset=iso-8859
+-1" />
# </head>
# <body>
#
ok 3 - print world
not ok 4 - blah
#   Failed test 'blah'
#   at test3a.pl line 23.
not ok 5 - hello
#   Failed test 'hello'
#   at test3a.pl line 23.
not ok 6 - <input type="hidden" name="blah" value="hello"  />
#   Failed test '<input type="hidden" name="blah" value="hello"  />'
#   at test3a.pl line 23.
not ok 7 - <form method="post" action="http://localhost?blah=hello" en
+ctype=
tipart/form-data">
#
#   Failed test '<form method="post" action="http://localhost?blah=hel
+lo" en
e="multipart/form-data">
# '
#   at test3a.pl line 23.
not ok 8 - http://localhost?blah=hello
#   Failed test 'http://localhost?blah=hello'
#   at test3a.pl line 23.
ok 9 - <input type="submit" name=".submit" />
ok 10 - </form>
ok 11 -
# </body>
# </html>
# Looks like you failed 5 tests of 11.
[download]

What I think we should have is an easy way of detainting the outputs from the CGI module. The detainting needs to be easy and reliable so that it will actually be used. Often it will be best if it is customized to the particular web-site. The CGI::Untaint module is some help but fails in the following respects:

Quite a few moving parts are required to customize a regular expression using CGI::Untaint. You need to create a module inheriting from, say, CGI::Untaint::object and configure the INCLUDE_PATH to your local repository of such modules.
My problem with the templating example is that it is all very well, getting the paramters, detainting them and using them. When you call a function like start_form, the CGI module gets the paramters BUT DOES NOT DETAINT them. The output from a function like start_form or self_url is much harder to parse. CGI::Untaint does not obviously help with this.

Where the CGI::Untaint does help is that it provides a library of detainting recipes, and some of the targets are tricky so it is better to have it done once correctly.

My proposal

package CGI::Taintless;

use vars qw(@ISA);

sub new {
    my $class = shift;

    # inherit from a CGI hash-based object, to which we default many C
+GI operations
    my $self = shift; # must be something conforming to CGI interface
    my $uclass = ref($self);
    @ISA = ($uclass);

    die "I am going in circles" if $uclass eq $class;

    $self->{__Taintless_taint_handlers} = shift || {}; # must be a par
+am => taint handler mapping
    my $max_param_len = shift || 10;
    $self->{__Taintless_param_check} = "^\(\[\\w\\\_\]\{1\,$max_param_
+len\}\)\$";

    bless $self, $class;
    return $self;
}

sub param {
    my $self = shift;
    if (scalar(@_) == 0) {
        # must only allow alphanumeric parameters for which we have ta
+int handlers
        my @params = $self->SUPER::param();
        my @filtered = ();
        foreach my $p (@params) {
            if ($self->get_re($p)) {
                push @filtered, $1 if $p =~ /$self->{__Taintless_param
+_check}/;
            }
        }
        return @filtered;
    }
    elsif (scalar(@_) == 1) { # will be tainted
        my $p = shift;
        my $v = $self->SUPER::param($p);
        return undef unless defined($v); # need this line to deal with
+ the .cgifields parameter
        my $re = $self->get_re($p) || die "Cannot find taint handler f
+or $p";
        return $1 if $v =~ /$re/;
        die "$p does not pass taint check"
    }
    else {
        $self->SUPER::param(@_);
    }

}

sub get_re {
    my $self = shift;
    my $p = shift;
    if (exists $self->{__Taintless_taint_handlers}->{$p}) {
        return $self->{__Taintless_taint_handlers}->{$p};
    }
    elsif (exists $self->{__Taintless_taint_handlers}->{-DEFAULT_HANDL
+ER}) {
        return $self->{__Taintless_taint_handlers}->{-DEFAULT_HANDLER}
+;
    }
    return undef;
}

1
[download]

I tried this out with the following script:

#!/usr/bin/perl -wT
# test3.pl
use CGI;
use Test::Simple tests => 11;
use Scalar::Util qw(tainted);
use lib qw(...........); # set this as appropriate.
use CGI::Taintless;
my $q = CGI->new();
my $c = CGI::Taintless->new($q, {blah=>'^([helo]+)$'});

pr($c->header());
pr($c->start_html());
pr("print world");
pr($c->param());
pr($c->param("blah"));
pr($c->hidden(-name=>"blah"));
pr($c->start_form());
pr($c->self_url());
pr($c->submit());
pr($c->end_form());
pr($c->end_html());

sub pr {
    my @thingy = shift;
    foreach my $t (@thingy) {
        ok(!tainted($t), $t);
    }
}
[download]

Then all tests are passed.

My questions

Do Monks agree that this is a good approach and if not why not?
Can anyone think of any case or situation that will not be covered?
What would be the best way of supporting common detainting requirements?

In reply to RFC: CGI::Taintless by SilasTheMonk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.