Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have almost finished a web script to dump a load of data taken from my website, to a file. I just want some inteligent monks to check it over, and possibly provide an alternative solution to the 'if' statments i have in the 'Check For Bad Input Section'.
#!/usr/bin/perl -T use strict; use warnings; use CGI; ######################## # Set System Variables # ######################## my $w = new CGI; my $redir = new CGI; my $ip = $ENV{REMOTE_ADDR} || 'N/A'; my @errors = (); my $file = time.int rand 1_000_000; ###################### # Set Variable Specs # ###################### my %spec = ( fullname => qr/^ [-.\w\s] {3,30} \z/x, username => qr/^ [A-Za-z] [-.\w]{2,16} \z/x, mothers => qr/^ [-.\w\s] {3,30} \z/x, email => qr/^ [\w\.-]+ \@ (?:[a-z\d-]+\.)+ [a-z\d]+ \z/ix, conditions => qr/^Yes\z/, ); my %fields; ####################### # Check For Bad Input # ####################### foreach( keys %spec ){ my $v = $w->param($_) or do { push @errors => "$_ is not defined.\n"; next }; $v =~ $spec{$_} or do { push @errors => "$_ contains invalid input.\n"; next }; $fields{$_}=$v } if (getpwnam $fields{username}) { push @errors => "username already taken.\n"}; if ($fields{username} =~ m/^administrator|accounts|support|postmaster| +webmaster|spam-admin| |technical|billing|sales|misuse|mail|virus-admin|manager|usenet|ho +stmaster$/) { push @errors => "username already taken.\n"}; print_error( @errors ) if @errors; ################################### # Log User Details For Processing # ################################### open LOGFILE,'>>',"/var/log/tmp/$file" or die "$file opening failed: $ +!"; print LOGFILE join(',',@fields{qw/fullname username mothers email/},$i +p),"\n"; close LOGFILE; ########################## # Display Completed Page # ########################## print $redir->redirect('http://www.foo.com'); ###################### # Display Any Errors # ###################### sub print_error { print $w->header('text/plain'), @_; exit; }

edited: Tue Jul 8 18:36:58 2003 by jeffa - (title change: was Almost Perfect Code?)

Replies are listed 'Best First'.
Re: Code review: validation regexes
by Abigail-II (Bishop) on Jul 08, 2003 at 07:17 UTC
    I don't think your regexes are right. Assuming that the 'fullname' and 'mother' check names, I wonder why you allow names to contain underscores and digits, but don't allow them to start with an accented letter? Also, the publisher of the Camel and the Lama wouldn't pass the test. Furthermore, the regex for email addresses is just plain wrong. It allows illegal addresses (say foo@...9), but worse, it will reject legal addresses, like $perl$@abigail.nl.

    I don't know when you are adding usernames to the system, but you have a problem. You reject usernames that are already in use, but you aren't checking the queue. Say, foo isn't taken yet, dozens of people could submit foo, until the first foo gets added to the system. What will you do with the others, who thought their submission was fine?

    Abigail

      Hi there, what would you suggest for the regexes? I am not very good at them!

      The second script picks up the files every 5 minutes, and will send out an e-mail if the username has already been taken. However, if there is a way to check for these duplicates, then please show me.

      Is there an easier way of writing this script? I need users to be added to the system via a site (that has restricted access), but am worried about the security of the whole thing, and that is why i have opted to do this the 'cron' way.
        Hi there, what would you suggest for the regexes?

        That's not the right question to ask. That's like responding to someone saying I don't think granite is the right material to make lifesavers from with what kind of stone would you suggest?

        For email validation, I would not use a regexp. RFC(2)822 is authorative. I'd use a module that follows the specs from the RFC.

        As for names, I don't think there's a standard. But if you're going to accept digits in names, I would want accented letters and ' as well.

        Abigail

Re: Code review: validation regexes
by nysus (Parson) on Jul 08, 2003 at 18:11 UTC
    This RE is hosed:
    m/^administrator|accounts|support|postmaster|webmaster|spam-admin| |technical|billing|sales|misuse|mail|virus-admin|manager|usenet|ho +stmaster$/
    You need to group the alternations with parentheses and hack off one of the duplicate bars in the middle of the RE. Here is what you are looking for:
    m/^(administrator|accounts|support|postmaster|webmaster|spam-admin|tec +hnical|billing|sales|misuse|mail|virus-admin|manager|usenet|hostmaste +r)$/

    $PM = "Perl Monk's";
    $MCF = "Most Clueless Friar Abbot Bishop Pontiff";
    $nysus = $PM . $MCF;
    Click here if you love Perl Monks

      to append to that idea, the actual regex should be
      m/^(?:administrator|etc|yadda|blah)/;
      since you are not using $1, dont make perl go through the trouble of storing it to return it. the ?: says look for this but dont hang on to it.

      MMMMM... Chocolaty Perl Goodness.....
        I thought about putting that in but decided against it.

        I mean, unless the code is being used millions of times per second, does it really matter? When I was a beginner programmer, I constantly fretted that my code wasn't as efficient as it could be. Now I think I'm beginning to figure out that wasting my mental energy over these kinds of issues is a lot worse than wasting a half dozen or so computer clock cycles. For the kinds of jobs Perl is good for, and unless you are stuck with some a 286 microprocessor, it probably isn't worth the trouble of writing super-efficient code---especially when it makes the code more difficult for newbies to understand.

        $PM = "Perl Monk's";
        $MCF = "Most Clueless Friar Abbot Bishop Pontiff";
        $nysus = $PM . $MCF;
        Click here if you love Perl Monks