Code Cleanup challenge!

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I'm validating user input through forms via CGI using the following script I wrote:

#!/usr/bin/perl -wT

use strict;

use CGI;

my $q = new CGI;

# keys are param names
# values are regexes matching allowed content and max allowed length
my %required_params  = ( 'name' => ['(\w+)', '255'] );
my %optional_params  = ();
my %validated_params = ();


# ensure all required params have been received, validate them
for my $param (keys %required_params) {
    
    unless ($q->param($param)) {
        error("Missing parameter $param\n");
    }
    
    my $regex = $required_params{$param};
    if ($q->param($param) =~ /$regex/) {
        $validated_params{$param}[0] = $q->param($param);
    } else {
        error("Invalid structure for parameter: $param");
    }
    
    unless (length($q->param($param)) <= $required_params{$param}[1]) 
+{
        error("Parameter: $param is too long\n");
    }        
        

}

# validate all optional params
for my $optional_param (keys %optional_params) {

    my $regex = $optional_params{$optional_param}[0];
    if ($q->param($optional_param) =~ /$regex/) {
        $validated_params{$optional_param} = $q->param($optional_param
+);
    } else {
        error("Invalid structure for parameter: $optional_param");
    }
    
    unless (length($q->param($optional_param)) <= $optional_params{$op
+tional_param}[1]) {
        error("Parameter: $optional_param is too long\n");
    }    
    
}


output_page();



sub error {

    # overkill, but allows flexibility in the future

    my $error = shift;
    
    print "Error: $error";
    
    exit();
    
}


sub output_page {
    
    # change to templating system for anything more than a few lines

print <<EOF;

<html>
<head>
<title>Thanks!</title>
</head>
<body>
<p>Thank you for your input.</p>
<p><a href="index.html">Return home</a></p>
</body>
</html>        
    
EOF

    exit();    


}
[download]

I'd really like to clean this code up but I'm not quite sure how. The one thing that comes to mind is to split each validation step up into subs. So I'd have a validate_length, validate_content, and exists sub of some kind. I'm not sure that would solve the problem though. I also don't know how I'd go about validating the optional and required parameters in the same loop.

I'd appreciate any suggestions you have :)

Edit by tye, added READMORE, moved from Meditations

Comment on Code Cleanup challenge! Download Code

Replies are listed 'Best First'.
Re: Code Cleanup challenge! by dws (Chancellor) on Aug 10, 2003 at 20:08 UTC
I would focus on getting the code correct before I worried too much about getting it cleaned up. `my $regex = $required_params{$param};` [download] isn't doing what you expect it to. `$regex` ends up holding an anonymous array. You need something like `my $regex = $required_params{$param}[0];` [download] (You correctly use `$required_params{$param}[1]` later to pick off the size, which leads me to suspect that you're posting partially debugged code.) Even then, the regex you're using looks suspect, unless you intend to allow a string that has at least one "word" character. Also, a match sets `$1`, which there's no evidence of the coding using. My suggestion: Get this working first. Write unit tests if you have to. Then clean it up. You can try to worry about correctness and cleanliness at the same time, but my experience is that it'll take you 50% more time (or more) if you go that route.	[reply] [d/l] [select]
Re: Code Cleanup challenge! by Zaxo (Archbishop) on Aug 10, 2003 at 17:17 UTC
Often it's best to fail if a match succeeds. In your example, you try to verify that data is all word characters. The test, `$string =~ /\W/` will be false only if $string is all word characters. That is a much simpler approach, testing for what you won't accept. After Compline, Zaxo	[reply] [d/l]
Re: Re: Code Cleanup challenge! by sauoq (Abbot) on Aug 10, 2003 at 21:44 UTC
That is a much simpler approach, testing for what you won't accept. In most cases, that's not true. The usual rule of thumb is to test for what you will accept, and it's a good one for several reasons... It's usually less error prone. Not in a simple case like the OP's where perl provides both `\w` and the negated class `\W`, but in cases where you want to accept a more complex class like `[\w.~%^+=-]` negating the class isn't so easy. Adding something to the class of what you will accept becomes difficult when you are trying to do it backward. For example, what if the OP wants to add a dash to the characters he will accept. You can't just add something to `\W`, you have to manually change it to a negated class like `[^\w-]`. And all the while, your code gets harder to read. It goes from "fail if I match a non-word character" to "fail if I match a character that is not a word character or a dash." Trying to turn that into a positive, the way we usually like to think, makes that "succeed if I do not match a character that is not a word character or a dash" and that requires a mental flip with a half-twist. Another good reason to specify what you will accept rather than what you won't: untainting tainted data. In order to untaint, you need to capture what you want to keep anyway. -sauoq "My two cents aren't worth a dime.";	[reply]
Re^3: Code Cleanup challenge! by Aristotle (Chancellor) on Aug 11, 2003 at 21:32 UTC
In order to untaint, you need to capture what you want to keep anyway. For the sake of being argumentative, no, you don't have to. This is Perl though, you're free to shoot yourself in the foot. You could f.ex `s///` everything you want to disallow, then capture `/(.)/s`. But obviously, the untaint mechanism was chosen to encourage "allow what you will accept" logic. Makeshifts last the longest.*	[reply]
Re: Re^3: Code Cleanup challenge! by sauoq (Abbot) on Aug 11, 2003 at 21:38 UTC
Re: Code Cleanup challenge! by monktim (Friar) on Aug 11, 2003 at 13:49 UTC
Here's another suggestion. You can use Getopt and POD (plain ond documentation). All of your command line options are stored in a single hash for easy lookup. use strict; use warnings; use Getopt::Long; use Pod::Usage; # Process and validate the command line options in @ARGV. my %options; my $parse = new Getopt::Long::Parser; if (!($parse->getoptions('infile=s', \($options{infile}), 'outfile=s', \($options{outfile}), 'help', \($options{help})))) { help(); } if (not defined($options{infile})) { help(); } sub help { pod2usage(0); } __END__ =head1 NAME A.pl - Creates A file =head1 SYNOPSIS A.pl [options] Options: --infile=[file] --help =over 3 =item B<--infile -i=[file]> A file that is used for input, wildcards are allowed =item B<--outfile -o=[file]> Optional parameter for the output file =item B<--help -h -?> Print a brief help message and exits. =head1 DESCRIPTION B<This program> creates a A files. =cut [download]	[reply] [d/l]
Re: Code Cleanup challenge! by kesterkester (Hermit) on Aug 12, 2003 at 17:33 UTC
As long as the options are all being stored in one hash anyhow, I think the reference-to-options-hash usage of Getopts::Long is easier to read: `[snip] # Process and validate the command line options in @ARGV. my %options; GetOptions ( \%options, qw{ infile=s outfile=s help } ) or help(); [snip]` [download]	[reply] [d/l]
Re: Code Cleanup challenge! by Anonymous Monk on Aug 10, 2003 at 19:30 UTC
Hi, I've got this all solved now. I split everything up into subs and created one hash for all permitted parameters and then had an array of required parameters. Now it's beautiful as Perl should be :)	[reply]
Re: Code Cleanup challenge! by Willard B. Trophy (Hermit) on Aug 11, 2003 at 18:33 UTC
At risk of furthering my XP slide, are you sure you want to reinvent the form validation wheel when there are modules such as Data::FormValidator (or for CGI::Application fans, CGI::Application::ValidateRM) around? Blame my missionary zeal on discovering how neat CGI::Application is -- especially the way that the same code can be run as vanilla CGI, or under mod_perl with no modification. -- bowling trophy thieves, die!	[reply]