stallion has asked for the wisdom of the Perl Monks concerning the following question:

Hi i have written a regex to match some tags ..the issue is when i try to match "-"(HYPHEN) in the tag , "_"(UNDERSCORE) is also getting matched...the regex is written below:-

$Prefix='DOC_'; if ($line =~ /$Prefix[a-zA-Z]*[0-9]*-*[0-9a-z]{3}/)

the tags are

DOC_001_123

DOC_002_214

DOC_001-548

DOC_001-987

I want the last two tags with the - but im getting every tags... where is the issue in the above regex..thanks....

Replies are listed 'Best First'.
Re: Regex MATCH
by kcott (Archbishop) on Sep 14, 2012 at 08:37 UTC

    G'day stallion,

    Your problem is that * matches zero or more times. For DOC_001_123:

    1. $Prefix matches DOC_
    2. [a-zA-Z]* matches zero times
    3. [0-9]* matches zero times
    4. -* matches zero times
    5. [0-9a-z]{3} matches 001
    6. no further matching in regexp: /$Prefix[a-zA-Z]*[0-9]*-*[0-9a-z]{3}/ successfully matches DOC_001_123

    That may be enough for you to fix your code. If not, please post a more representative sample of your tags: given the [a-zA-Z]* in your regexp, I assume some tags are more complicated than those posted here.

    -- Ken

Re: Regex MATCH
by 2teez (Vicar) on Sep 14, 2012 at 08:44 UTC
    hi,

    If I understand your question, you want to match the line that has HYPHEN towards end of the lines. If so, the below script does that

    use warnings; use strict; while ( defined( my $line = <DATA> ) ) { chomp $line; print $line, $/ if $line =~ m/\-[0-9a-z]{3}/; } __DATA__ DOC_001_123 DOC_002_214 DOC_001-548 DOC_001-987
    Output
    DOC_001-548 DOC_001-987
    UPDATE:
    You really can see, how your Regex matches, using
    use re 'debug';
    in your script or you can get explanation of a regular expression using YAPE::Regex::Explain

    If you tell me, I'll forget.
    If you show me, I'll remember.
    if you involve me, I'll understand.
    --- Author unknown to me
Re: Regex MATCH
by Utilitarian (Vicar) on Sep 14, 2012 at 08:14 UTC
    With the tags you provide, wouldn't something like the following be more discerning?
    Utilitarian@busybox ~$cat tmp/tmp.pl #!/usr/bin/perl use strict; use warnings; my $prefix='DOC_'; for (<DATA>){ chomp(my $line=$_); print "$line\n" if ($line =~ /^$prefix[0-9]{3}(_|-)[0-9]{3}$/); } __DATA__ DOC_001_123 DOC_002_214 DOC_001-548 DOC_001-987 Utilitarian@busybox ~$perl tmp/tmp.pl DOC_001_123 DOC_002_214 DOC_001-548
    EDIT

    bart below is correct, to constrain the matches to hyphens the regex above should have read /^$prefix[0-9]{3}-[0-9]{3}$/

    print "Good ",qw(night morning afternoon evening)[(localtime)[2]/6]," fellow monks."
      You've got it backwards. His regex matches, but he doesn't want it to match.
Re: Regex MATCH
by BillKSmith (Monsignor) on Sep 14, 2012 at 14:31 UTC

    Others have already pointed out the problem with the asterics in your regex. I suspect that you intend to match the middle field if it contains no digits (letters only) or no letters (digits only - the case in all your examples) as well letters followed by digits, but not if the field is empty.

    I assume that the asterick after the hyphen is a mistake. You probablly intend to require only one hyphen as a field seperator. Here is an implementation of my assumptions.

    use strict; use warnings; use Readonly; Readonly::Scalar my $PREFIX => qr/DOC_/; Readonly::Scalar my $FILTER => qr/ $PREFIX (?: [A-Za-z]+\d* | [A-Za-z]*\d+ ) - [0-9a-z]{3} /x; my @tags = qw( DOC_001_123 DOC_002_214 DOC_001-548 DOC_001-987 ); my @dash_tags = grep {/$FILTER/} @tags; print "@dash_tags\n";
    Bill