THRAK has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to match a string, that logically states: "Unless the string begins with a number that is 5 or more digits long and does not contain any non-digits". I've been using this small test loop:
#!/usr/bin/perl -w use strict; my @stuff = qw(1234 a12345 12345 12345z); foreach my $num (@stuff) { unless ($num =~ /^\d{5,}/) { print "Not Valid: $num\n"; } else { print "Valid: $num\n"; } }
As shown here this produces:
Not Valid: 1234
Not Valid: a12345
Valid: 12345
Valid: 12345z

I'm trying to invalidate that last item. I've tried a couple of things, most notably:
/^\d{5,}\D+/ #invalidates all except "12345z" /^\d{5,}|\D+/ #only "1234" invalid
I know this shouldn't be difficult, but I've hit a mental roadblock so any help would be appreciated.

-THRAK

Replies are listed 'Best First'.
Re: Regex Pattern Problem
by danger (Priest) on Feb 28, 2001 at 02:49 UTC

    How about simply adding the $ anchor to your current regex? (5 or more digits and then the end of the string):

    unless ($num =~ /^\d{5,}$/) { # and more generally to combine a couple of conditions: unless ($num =~ /^\d{5,}/ and $num !~ /\D/) {
Re: Regex Pattern Problem
by arturo (Vicar) on Feb 28, 2001 at 02:53 UTC

    Well, the first one's a forehead-smacker: of the strings, the only one that *SATISFIES* your first attempt at a fix (and hence is marked as valid) IS 12345z ("unless the string is five digits followed by one or more non-digits, mark it as invalid")

    the second one says "match something that has five digits at the beginning *OR* which contains (somewhere) one or more non-digits. (i.e. the caret doesn't bind the \D to the beginning of the string. You'd have to use parentheses (perhaps memory-free) for that, e.g.:

    /^(?:\d{5,}|\D+)/

    I'd suggest changing the logic slightly, and I'll use the nifty ternary operator too:

    foreach (@stuff) { my $status = ( /^\d{5,}$/ ) ? 'valid' : 'invalid'; print "$_ is $status\n"; }

    HTH

    Philosophy can be made out of anything. Or less -- Jerry A. Fodor

Re: Regex Pattern Problem
by ZZamboni (Curate) on Feb 28, 2001 at 02:53 UTC
    I think your logical statement can be restated as "a line made only of digits, at least 5 digits long", and in that case the regular expression you need is
    /^\d{5,}$/
    You could add a \s* right before the $ if you want to allow for spaces at the end.

    Or if you want to allow both digits and spaces, you could use:

    /^\d{5,}[\d\s]*$/

    --ZZamboni, aka Duke Dong

Re: Regex Pattern Problem
by myocom (Deacon) on Feb 28, 2001 at 02:48 UTC

    After a little playing around, I came up with this:

    #!/usr/bin/perl -w use strict; my @stuff = qw(1234 a12345 12345 12345z 123456); foreach my $num (@stuff) { unless ($num =~ /^\d{5,}$/) { print "Not Valid: $num\n"; } else { print "Valid: $num\n"; } }

    Note the $ anchor there...this should match any string of digits that's at least 5 digits long, and it will fail for anything with non-digits in it.

    EDIT: Removed the extra crap before the $ anchor.

      Duh! Thank You. The /^\d{5,}$/ expression works just fine. As I said in my original post, I knew this should be fairly obvious but my brain was stuck in neutral. This meets the criteria of "match any string that consists of 5 or more digits and does not contain any other characters". Lots of other good answers to variations of the problem that may prove useful in the future.
      -THRAK
(boo) Re: Regex Pattern Problem
by boo_radley (Parson) on Feb 28, 2001 at 02:50 UTC
    unless ($num =~ /^\d{5,}/ && $num !~/\w/)
    should do you.
Re: Regex Pattern Problem
by Yoda (Sexton) on Feb 28, 2001 at 07:03 UTC
    I may be wrong, I am kind of new at this, but I believe there is one more problem with your regex. If you are trying to match 5 and only 5 digits, you need to drop the comma as well. Page 68 of the nutshell book says the {n,} will match at least 5 or more. You need to use {5} to match 5 and only 5 digits.

    unless ($num =~ /^\d{5}$/) {


    Even with an anchor /^\d{5,}$/ will match any line that starts with at least 5 numbers but contains more than 5 numbers.

    Update: I see after re-reading your problem that you do want 5 or more. I was to excited about regex. It is the topic of discussion this week in a class I am taking.

    Yoda
Re: Regex Pattern Problem
by aardvark (Pilgrim) on Feb 28, 2001 at 03:30 UTC
    If you want to create a new array from the old you may want to try;
    my @good_stuff = grep (/^\d{5,}$/, @stuff);
    You may also want to look at; Item 12: Use foreach, map and grep as appropriate in Effective Perl Programming.

    Get Strong Together!!
Re: Regex Pattern Problem
by sierrathedog04 (Hermit) on Feb 28, 2001 at 17:30 UTC
    boo_radley's solution is intriguing:
    unless ($num =~ /^\d{5,}/ && $num !~/\w/)
    According to Robert's Perl Tutorial "The \w construct actually means 'word' - equivalent to a-zA-Z_0-9"

    Because \w includes digits it looks as if this regex might reject all numbers. I tried out the following code here:

    my $num = "123456"; unless ($num =~ /^\d{5,}/ && $num !~/\w/) { print p("Sorry, $num does not match the regex"); } else {print p("$num matched")};
    and sure enough the number 123456 did not match the regex.