Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

How would I match the following patterns:
(5 to 8 digits) or (3 alpha chars 1 hyphen 4 chars)
i.e 2312112 or abc-erte
I've tried a few permutations including:

m/[\d]{5,8}|[:alpha:]{3}[-]{1}[:alpha:]{4}/g m/[\d]{5,8}|[\D]{3}[-]{1}[\D]{4}/g
but they dont seem to work :(
Any clues ? TIA

Replies are listed 'Best First'.
Re: reg exp
by fruiture (Curate) on Sep 30, 2002 at 09:28 UTC

    The first place to go is `man perlre`: you should get rid of all these [ and ] characters when they're redundant..

    for( qw{ 2312112 abc-erte } ){ print "$_ ", m<^(?: \d{5,8} #5 to 8 digits | # OR \w{3} # 3 alpha chars # (maybe you want to replace \w) - # one hyphen .{4} # 4 chars )$>x ? 'matches' : 'doesn\'t match', "\n"; }

    update: typos at end of regexp

    --
    http://fruiture.de
Re: reg exp
by Preceptor (Deacon) on Sep 30, 2002 at 09:38 UTC
    OK, quick hack of your code gives me this:
    #!/usr/bin/perl use strict; use warnings; my @test = ( 123, 12345, 12345678, "abc-abc", "abc-abcd" ); foreach my $match ( @test ) { if ( $match =~ m/\d{5,8}|\w{3}-\w{4}/ ) { print "$match match\n"; } else { print "$match doesn't match\n"; } }
    Which seems to work.
    (Note that I have used \w instead of \D. See the manpage for what the difference is).
    But like the man said in the previous post. perldoc perlre.

    --
    It's not pessimism if there is a worse option, it's not paranoia when they are and it's not cynicism when you're right.
Re: reg exp
by Juerd (Abbot) on Sep 30, 2002 at 10:36 UTC

    m/[\d]{5,8}|[:alpha:]{3}[-]{1}[:alpha:]{4}/g

    I think you should use [[:alpha:]], not [:alpha:]. I'm not sure, though

    - Yes, I reinvent wheels.
    - Spam: Visit eurotraQ.
    

      No you got that right. Be unsure no longer. The named POSIX classes are nice for keeping things clear especially when you consider character sets and Unicode. [A-Za-z0-9_] is sufficient for US-ASCII but it's wordy and easier to mistype than [:alphanum:]. The nice thing is that if you start moving beyond US-ASCII then you've already taken care of some of th work up front. It's also more in line with what I gather will be the expected idiom for perl6. The idea here is that it is infeasable for you to actually specify what is a "word" character for whatever character set you're using in your regular expression.