footpad has asked for the wisdom of the Perl Monks concerning the following question:

The apprentice, after again struggling for quite some time, caves...

Consider, if you will:

#! /usr/bin/perl -wT use strict; use CGI qw( :standard ); my $testvalue = ''; printForm(); if ( param() ) { br(); $testvalue = param( 'TestValue' ); my $errors = ""; $errors .= paramCheck( "Test Value", $testvalue, 1, 30, '/\W/', "letters and numbers" ); if ( $errors ) { print p( "Failed" ), "<UL>$errors</UL>" } else { print p( "Passed...wow, what a surprise." ); } } exit 1; sub paramCheck # Returns "" if no error or description of problem(s). { my $lbl = shift; # Field name for error messages my $val = shift; # Value received from the CGI query my $min = shift; # Minimun length; Use >0 for required fields my $max = shift; # Maximum allowed length my $pat = shift; # Regex to validate the data against. my $msg = shift; # Explanation added to error if Regex fails. my $err = ""; # Holds error message my $len = length( $val ); if ( $min ) # <> 0 { if ( $len < $min ) { $err .= wes( "$lbl cannot be blank." ); } } elsif ( $len > $max ) { my $s = $len == 1 ? "" : "s"; $err .= wes( "$lbl is currently $len character$s; " . "; it can only be $max character$s long." ); } else { unless ( $val =~ $pat ) { $err .= wes( "$lbl contains invalid characters; " . "it can only hold $msg." ); } } return $err; } sub printForm { print header(), start_html( -title => 'Test' ); print start_form( "post", "/cgi-bin/valtest.cgi", "application/x-www-form-urlencoded" ); print p( 'Enter a value:' ), textfield( 'TestValue', $testvalue, '40', '30' ); print br(), submit( "Try it" ), '&nbsp;&nbsp;', reset( "Reset" ); print end_form(); print end_html(); } sub wes # Wrap an Error String with desired tags. { my $input = shift; return "<LI>" . $input . "</LI>"; }

The idea, of course, is to devise a routine that handles the bulk of my CGI parameter validation. Unfortunately, I'm not seeing the results I expect with the regular expression in this example. Specifically, it always claims to match, even though I submit values such as "~", "/", "\", and so on (characters I'm specifically trying not to accept--hence the \W.)

My petitions are:

--f

P.S. Possibly relevant details: Perl v5.005_03, CGI v2.753, BSD 4.2.

Replies are listed 'Best First'.
Re: Dynamic Regex?
by Kanji (Parson) on May 20, 2001 at 08:04 UTC

    Your never make it to the regex test because $min (at least in your code) is a true value and your sanity checks aren't cumulative but mutually exclusive thanks to the if ( $min ) structure.

    Perhaps you meant to roll the two ifs into one so that you can fall through to the other tests?

    if ( $min && $len < $min ) # <> 0 { $err .= wes( "$lbl cannot be blank." ); }

    Also, if you're going to pass a regex like that, it either needs to be done as qr/\W/, sans the slashes, or be evaled so that the slashes aren't treated literally.

    You may want to rethink your regex, logic, and error message as they seem to be at odds with another: it should be \w instead of \W (or an if instead of unless), and there prolly ought to be some anchors so that bogus input with a single valid match doesn't pass.

    Finally, on reinventing the wheel ... have you looked at HTML::FormValidator? :)

        --k.


Re: Dynamic Regex?
by Albannach (Monsignor) on May 20, 2001 at 08:07 UTC
    It looks like in your example you're saying "unless $val contains one non-alphanumeric, report an error". You could change the unless to if, or replace =~ with !~.

    I also find your sub arguments a bit confusing. Your 5th is a pattern that you don't want to match, and your 6th is a word description of the pattern you do want to match. This may be leading you to your binary confusion (I get that all the time ;-).

    Update: ...and I didn't catch the problem with passing the regex in the first place (/me slaps self and goes to bed)

    --
    I'd like to be able to assign to an luser

Re: Dynamic Regex?
by tachyon (Chancellor) on May 20, 2001 at 10:42 UTC
    The most obvious problem is your chcking logic. If you want to sequentially check several things you need: sub checkit { if {} if {} if {} return; } not this (which is what you have) sub checkit { if {} elsif {} else {} return; } Your sub exits the if/elsif/else statement if $min is non zero and also $len < $min. BTW: Why you use two ifs here instead of an and to eval these two conditions is beyond me. This is fixed below: sub paramCheck # Returns "" if no error or description of problem(s). { my $lbl = shift; # Field name for error messages my $val = shift; # Value received from the CGI query my $min = shift; # Minimun length; Use >0 for required fields my $max = shift; # Maximum allowed length my $pat = shift; # Regex to validate the data against. my $msg = shift; # Explanation added to error if Regex fails. my $err = ""; # Holds error message my $len = length( $val ); if ( $min and $len < $min ) { { $err .= wes( "$lbl cannot be blank." ); } } if ( $len > $max ) { my $s = $len == 1 ? "" : "s"; $err .= wes( "$lbl is currently $len character$s; " . "; it can only be $max character$s long." ); } if ( $val !~ $pat ) { $err .= wes( "$lbl contains invalid characters; " . "it can only hold $msg." ); } return $err; } I assume you are assigning $pat as so (it won't work otherwise): $pat = m/foo/; Run this if you are unsure of what I am getting at: $pat = "m/foo/"; print "foo1\n" if "foo" =~ $pat; $pat = m/foo/; print "foo2\n" if "foo" =~ $pat; $pat = qr/foo/; print "foo3\n" if "foo" =~ m/$pat/; Passing $pat into a sub is no problem using either of the second two methods above. The first does not work period as you will see if you run it. happy coding tachyon
      tachyon compares three methods of assigning regexes to a variable:
      $pat = "m/foo/"; print "foo1\n" if "foo" =~ $pat; $pat = m/foo/; print "foo2\n" if "foo" =~ $pat; $pat = qr/foo/; print "foo3\n" if "foo" =~ m/$pat/;
      Passing $pat into a sub is no problem using either of the second two methods above. The first does not work period as you will see if you run it.
      The second does not work either. It is equivalent to:
      $pat = ($_ =~ m/foo/); print "foo2\n" if "foo" =~ $pat;
      $pat will be either 1 or the null string, depending on whether $_ matches /foo/. You will see this if you run it and actually examine the value of $pat.
        Dang, you are correct! I never write $foo =~ $pat so as the seeker of wisdom needed to be using this construct I *quickly* checked it to see if it worked using the code posted. As you rightly point out although this appeared to work in my quick test if I had also tested this: $pat = m/foo/; print "big snafu, egg on face, whoops\n" if "bar" =~ $pat; then my error would have stared me in the face! Thanks for pointing out the error, I hate to post wrong advice and usually run a quick test first. Ah well, another day at the monestry, another snippet of wisdom on the road to enlightenment. As a result the original code will need a qr/$pat/ to work. head bowed, duly chastened :-( tachyon PS this is what you need to get it to work. Trust me ;-) $pat = qr/some pattern here/; ... ... ... sub paramCheck # Returns "" if no error or description of problem(s). { ... my $pat = shift; ... ... if ( $val !~ m/$pat/ ) { $err .= wes( "$lbl contains invalid characters; " . "it can only hold $msg." ); } return $err; }
Validating/cleaning input with tr
by Anonymous Monk on May 21, 2001 at 13:06 UTC
    Why not use 'tr' to get rid of junk in the input ?

    This will kill everything except o,k,_,c,h,a,r,s in $my_input:

    $my_input =~ tr/ok_chars//dc;

    If you want to check and not 'clean' $my_input, you can run the above on a copy of $my_input and then compare it to the 'clean' version.

    I suppose this can be used with U and C tr options for unicode support.


    FuzzyCow