bradcathey has asked for the wisdom of the Perl Monks concerning the following question:

qq wisely pointed out a problem with a validation script I am writing for untainting purposes. The value *must* contain at least one number, but *may* also contain periods and hyphens. I *solved* it, but it look's messy. Can it be written more concisely?
#!/usr/bin/perl -w use strict; my $number = "..2314-1234123."; if ($number =~ /^((?:[0-9\.-]*)(?:[0-9]+)(?:[0-9\.-]*))$/) { print "$1\n"; }
Added after posting In summary, it's a regular expression that asks for at least one of a character class, but can also have some from another class.

Thanks, monks.

Update: Thanks all for participating. This was a great learning experience. If Tiger Woods were only a Perlmonk..... It seems like there is Perl and then there are regular expressions. So far Abigail-II has the lead with 16 and a reg ex that even I can understand. Now, about these 2000 lines of code I have....
Update 2: I spoke too soon, I believe Abigail-II's 2 solutions need the capturing parens to return $1--so the race is a bit tighter.

—Brad
"A little yeast leavens the whole dough."

Replies are listed 'Best First'.
Re: Golf this reg ex
by davido (Cardinal) on Apr 16, 2004 at 02:04 UTC
    First, you don't have to backwhack (escape) the '.' character when it appears inside of a character class.

    Next, [0-9] is the same as \d.

    Third, the non-capturing parens are not helping you.

    Boil that down and you get:

    print "$1\n" if $number =~ m/ ^( [\d.-]* \d [\d.-]* )$ /x;

    Oh, forgot to mention; the /x modifier helps to keep things clean and tidy looking.

    If you really want it golfed, how about this:

    $number =~ /^([\d.-]*\d[\d.-]*)$/ and print "$1\n";


    Dave

      /^([\d.-]*\d[\d.-]*)$/ looks so innocent. It is however a very inefficient regex. Because you give Perl lots of ways of matching the lone \d, it can take a relatively long time for Perl to determine there is a failure. Dropping the \d from the first character class make a huge difference:
      #!/usr/bin/perl use strict; use warnings; use Benchmark qw /cmpthese/; our $re1 = qr /^[\d.-]*\d[\d.-]*$/; our $re2 = qr /^[.-]*\d[\d.-]*$/; our @strs = <DATA>; our (@d, @a); foreach (@strs) { die if /$re1/ xor /$re2/ } cmpthese -1 => { davido => 'my @a = map {/$re1/} @strs', abigail => 'my @a = map {/$re2/} @strs', } __DATA__ --1--2--3--4--5--6--7--8--9--0--1--2--3--4--5--6--7--8--9--0--a--2--3- +-4--5-- Rate davido abigail davido 23578/s -- -88% abigail 196495/s 733% --

      Abigail

Re: Golf this reg ex
by Abigail-II (Bishop) on Apr 16, 2004 at 09:44 UTC
    The value *must* contain at least one number, but *may* also contain periods and hyphens.
    My first reaction is: /\d/ (4 chars). This matches any string that contains at least one number. The string may also contain periods and hyphens. And any other character for that matter - your requirement doesn't say those characters are forbidden.

    But I guess the hidden requirement is that it doesn't contain anything else. You can test for that with two regexes: /\d/&&!/[^-.\d]/ (16 chars). Or, if you want to do it in one regex: /^[-.]*\d[-\d.]*$/ (18 chars).

    Abigail

Re: Golf this reg ex
by Enlil (Parson) on Apr 16, 2004 at 02:34 UTC
    The value *must* contain at least one number, but *may* also contain periods and hyphens.
    #!/usr/bin/perl use strict; use warnings; while (<DATA>) { print /^(?=.*\d)[\d.-]+$/ ? "THIS LINE WORKS: $_" : "THIS LINE DOESN'T: $_"; } __DATA__ .........---------........ ...............1.......... ...----1................. -----------1------------- ......................... ------------------------- ...................-1d... --------------------2---. .-.-.-.-.3983943959359395 342094242.---------------

    -enlil

Re: Golf this reg ex
by duff (Parson) on Apr 16, 2004 at 03:03 UTC
    $number = "..2341-24234233."; if ($number =~ /\d/ && $number =~ /^([\d.-]*)$/) { print $1; }

    :-)

Re: Golf this reg ex
by japhy (Canon) on Apr 16, 2004 at 05:18 UTC
    Here's my go:
    # assuming $_ has the string #234567890123456 => 16 characters /\d/*/^[\d.-]+$/
    Update:
    #23456789012345678 => 18 characters /\d/*/^([\d.-]+)$/
    It'll return true, and if it does, $1 will have the untainted text.
    _____________________________________________________
    Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
    s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;
      japhy, thanks but I'm not sure I understand the /* and the fact that you can have, what appears to be, compounded reg exes. Anyway, I can't get it to work in
      my $number = "..2314-1234123."; if ($number =~ /\d/*/^([\d.-]+)$/ ) { print "$1\n"; }
      If you have the time, a simple explanation would be educational. TIA

      —Brad
      "A little yeast leavens the whole dough."
        perl -MO=Deparse
        my $number = '..2314-1234123.'; if ($number =~ /\d/ * /^([\d.-]+)$/) { print "$1\n"; }
        The original trick is multiplying the first regex result (as a scalar) by the second regex result (as a scalar). That is, 1*1 is true, but 0*1 is false.

        However, since the two regexen aren't being bound to $number in your example, the code actually doesn't work. The second regex is searching $_ for matches instead.

        --
        [ e d @ h a l l e y . c c ]

Re: Golf this reg ex
by pbeckingham (Parson) on Apr 16, 2004 at 01:56 UTC

    Not really shorter, but easier to read...

    #! /usr/bin/perl -w use strict; my $number = "..2314-1234123."; my $stuff = qr/[0-9.-]/; if ($number =~ /^($stuff*\d+$stuff*)$/) { print "$1\n"; }