bradcathey has asked for the wisdom of the Perl Monks concerning the following question:

Learning Perl the right way, part IX:

Task: untaint and validate user input using subroutines in a separate module. Note: Because I eventually want to render the form with HTML::Template, listing the errors, I'm keeping track of the error messages for a <tmpl var> in my form.tmpl template file. So, after I get the data, I pass it, a literal for the error listing, and an array element for the actual message.

Monasterical questions: basically, is this code representative of "good practices" or efficient coding? Specifically, is my scoping, use of variables, passing of data, logic, and structure okay? Other comments?
use strict; require "Validate.pm"; use CGI qw(:standard); new CGI; my ($name, $phone, $email, $date, @errmsg, $errmsgs); $name = param('name'); &val_alpha($name, "Last Name", $errmsg[0]); $phone = param('phone'); &val_phone($phone, "Phone Number", $errmsg[1]); $email = param('email'); &val_email($email, "E-mail Address", $errmsg[2]); $date = param('date'); &val_date($date, "Birth Date", $errmsg[3]); &error_page(@errmsg); print "Errors:\n$errmsgs\n"; #handle errors sub error_page { for (@_) { if ($_) { $errmsgs .= "$_\n" } } } $template = HTML::Template->new(filename => "form.tmpl"); $template->param(errors => $errmsgs); #eventually process and write to the database __END__
And for the module:
#Validate.pm sub val_alpha { if ($_[0] =~ /^([A-Za-z -]*)$/) { $_[0] = $1; } else { $_[2] = "Invalid character(s) in $_[1]"; } } sub val_phone { if ($_[0] =~ /^[\(]?(\d{3})[\)\.\-]?(\d{3})[\)\.\-]?(\d{4})$/) { $_[0] = "$1-$2-$3"; } else { $_[2] = "Invalid $_[1]"; } } sub val_date { if ($_[0] =~ /^(\d{2})-(\d{2})-(\d{4})$/) { $_[0] = "$1-$2-$3"; } else { $_[2] = "Invalid $_[1]"; } } sub val_email { if ($_[0] =~ /^([\w\.\-]{3,})@([\w\.\-]{3,})\.([A-Z]{2,3})$/i) { $_[0] = "$1\@$2\.$3"; } else { $_[2] = "Invalid $_[1]"; } } 1;
Thanks all in advance.

—Brad
"A little yeast leavens the whole dough."

Replies are listed 'Best First'.
Re: Am I passing and testing user data correctly?
by chromatic (Archbishop) on Nov 21, 2003 at 04:08 UTC

    This is an improvement over last time. That said, I'd change several things.

    First, I don't care for the style of parameter passing that operates on elements of @_ directly. I find this much clearer:

    sub email { my ($class, $value) = @_; return unless $value =~ /^([\w\.\-]{3,})@([\w\.\-]{3,})\.([A-Z]{2, +3})$/i); return "$1\@$2\.$3"; }

    This has the advantage of separating error handling from validation; you get to decide what to do at the caller. That tends to make your designs simpler and more flexible. As I suggested in my previous comments, you can say:

    my $email = Validate->email( param( 'email' ) ); push @errors, "Missing or invalid e-mail address '$email'\n" unless $e +mail;

    If you think of classes and methods as nouns and verbs and put together your programs as if you were describing them in English, you'll find that they can be pretty maintainable.

    You really don't need a function to format error messages if you push the responsibility for formatting them back to the caller.

    I dislike the use of the ampersand for function calls as there's no reason to use it. (You're not invoking a sub reference nor do you need to bypass prototypes. Save yourself a sigil and skip it.)

    I don't see any reason to use require instead of use for Validate. It's shorter. Also, you can use the nicely descriptive class method syntax if you use a package statement within the Validate package.

      Thanks chromatic--invaluable example. I tried it and worked like a charm. And I get what you've done, with the exception of the $class variable. Why is it there, where does it get it's value, and what does it return? Also, it seems like the $email in the error string really will never have a value...right?


      —Brad
      "A little yeast leavens the whole dough."

        Good catch on the $email variable in the example. It's pretty useless in the error message. :)

        As for $class, the Validate->email() call is actually a class method call. You can read it as "call the email method of the Validate class".

        Due to the way Perl's OO works, the invocant (the noun: a class or object) you're asking to perform the verb (the method) is passed as the first argument to the method. In an object method call, you'd get the object as the first p arameter. In a class method call, you get the class name.

        perloot and perlboot have more details.

Re: Am I passing and testing user data correctly?
by Zaxo (Archbishop) on Nov 21, 2003 at 04:00 UTC

    This is much improved from your previous post. A few points, and one design thing. Design first.

    • Your validation functions do not all guarantee a result of the form you want for CGI validation, and they do not have a design that lets you demand non-zero length, first and last name, etc. They are good for detainting, but validation is skimped.
    • You don't need the & sigil before the function calls, and the parens are optional as well. Neither is an important point in your code, but the sigil can alter the behavior of the function.
    • Your phone, email and date validators are a too simple for the real world, but I take this to be from simplification for the node. Presumably you will use modules like Email::Valid and others for the real thing.
    • I like to put a definite true, like  1; at the end of a sub which modifies its arguments.
    • Added: You can get the same error message array, @errmsg, by supplying it to each validator call as $errmsg[@errmsg]. No need to keep track of numbers then.

    After Compline,
    Zaxo

      Thanks Zaxo for the kind words:
      I like to put a definite true, like 1; at the end of a sub which modifies its arguments.
      Do you mean this for all subs? Or just the .pm?

      —Brad
      "A little yeast leavens the whole dough."

        It's a useful convention in perl that operator-like subs (and many others) should return true if they are successful. The only one of your validation functions which might not is sub val_alpha, which successfully detaints an empty string and returns it. I make it a habit to avoid that automatically so I don't have to think about it :)

        It is only a convention, but a profoundly useful one. Perl does nothing to demand you follow it.

        After Compline,
        Zaxo

Re: Am I passing and testing user data correctly?
by davido (Cardinal) on Nov 21, 2003 at 03:26 UTC
    In validating alpha for the purpose of accepting a name, you might want to permit the ' character, as in O'Leary. And while you're at it, don't forget the possibility of someone having an accent of some form over one of the vowel's in their name. `'^~, etc. Ferdnand Magellen's name, in Portuguese is Fernando Magalhae~s (but with the tilde over the 'e').

    Your date format is somewhat limiting. It requires that the date come in with one particular (Americanized) format. While you'll still need to properly check it for taintedness, it might be helpful for your HTML to also help the user to get it right by providing drop-downs for the date instead of freehand input. Date::Manip may help you to be able to accept a broader range of date formats.

    Your phone validation is ok for the 50 states. It breaks down for international numbers, or local numbers for other countries.

    Your email validation is errant. You can't validate an email address with a regexp. You can't even guarantee that it is syntactically sound, without inadvertantly rejecting some syntactically correct addresses. Sorry. Friedl has an example that does a pretty good job at the end of his book, Mastering Regular Expressions (the Owls book). But he even goes on to explain in that book that his example isn't 100% reliable. For email, I think one of the best approaches is to carefully keep that address away from the shell, and in a safe way, send a verification email to the address in question, requesting a reply with a particular hard-to-guess code in it.

    PS: You're getting onto the right track. Do realize however that there is a difference between validating and untainting. If you can keep an email address away from the shell, and you can validate it through an actual email exchange with the user, you're most of the way there.


    Dave


    "If I had my life to live over again, I'd be a plumber." -- Albert Einstein