Speedfreak has asked for the wisdom of the Perl Monks concerning the following question:

I have a minor problem which I'm guessing could be done with a regex. I am being sent a parameter from a CGI form that I need to check is valid.

What I need to check is that:

The string is exactly 4 characters long, not one more, not one less. It must only contain numbers and letters. Also, there is a high chance the string will arrive in lowercase and I need to convert it to uppercase before my script can use it.

I need to redirect the user to an error page if the string was invalid rather than continue running the script.

Can anyone help me with this one?

- Jed

Replies are listed 'Best First'.
Re: String validation
by chromatic (Archbishop) on Mar 24, 2000 at 21:19 UTC
    One option:
    if ($string =~ m!(^[a-z0-9]{4})$!) { $string =~ tr/[a-z]/[A-Z]/; # do something } else { # do some error }
    This looks for the start of a string, exactly four characters from the class between lowercase a and lowercase z or between 0 and 9 inclusive, storing them, and then the end of the string.

    If found, it uppercases it. (I use tr/// because it's faster than a substitution, and because leading digits may cause a warning about modifying a constant. uc is another option.)

Re: String validation
by btrott (Parson) on Mar 25, 2000 at 02:22 UTC
    Shouldn't that regex also match upper-case letters? Your regex will say that the string "A32g" is bad, but according to the problem description, the original poster wanted to match *any* letters, including (presumably) upper-case. So that string should match. So I would think you could just change that code to:
    if ($string =~ m!(^[A-Za-z0-9]{4})$!) { $string =~ tr/[a-z]/[A-Z]/; # or $string = uc $string; # do something } else { # do some error }
    I did a bit of benchmarking of uc vs. tr, and the results were pretty similar:
    Benchmark: timing 1000000 iterations of tr, uc... tr: 10 secs ( 7.49 usr 0.00 sys = 7.49 cpu) uc: 9 secs ( 6.60 usr 0.00 sys = 6.60 cpu)
    So they should be pretty much interchangeable, in terms of performance.
      Let's make it a little more readable, then:
      if ($string =~ m!^(\w{4})$!) { $string =~ tr/[a-z]/[A-Z]/; # or $string = uc $string; # do something } else { # do some error }
      Since the OP wants four alphanumerics (and they are merely *likely* to arrive in lowercase), we'll use \w.

      The reason I suggested using tr/// instead of uc is that, in my testing, uc() failed on a scalar starting with a digit. Now it seems to work correctly. Let's go back to:

      if ($string =~ s!^(\w{4})$!uc($1)!e) { # do something } else { # do some error # print "Location: $error_url" }
        Do you mean \w? \d just matches digits. I thought of using \w, but \w is alphanumerics plus '_', and I didn't know if the OP wanted '_'.
Re: String validation
by turnstep (Parson) on Mar 30, 2000 at 03:50 UTC

    If they do NOT want the underscore, you should technically still try and use the \w anyway, due to the possible use of 'use locale':

    if ($string !~ /_/ && $string =~ s!^(\w{4})$!uc($1)!e) {