pip9ball has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

Can someone explain to me why the following expression will still match?

#!/usr/bin/perl my $string = "<*2>H<2:0>,I<3:0>,<*2>P<4:0:2>"; if($string =~ /^<\*(\d+)>(\S+)<(\d+):(\d+):(\d+)>$/) { print "Matched\n"; }

I have the "^ and $" modifiers in place so I would expect a match if and only if a string of this pattern is found:

my $string = "<*2>P<4:0:2>";

How can I modify my expression so that only the latter will be matched?

Thanks!

Replies are listed 'Best First'.
Re: why does this match still?
by Marshall (Canon) on Jun 04, 2009 at 00:14 UTC
    When debugging something like this, it is often helpful to print out what is being matched. See below code.

    I would suggest reading about "greedy" regexes. Perl regex'es are "greedy" meaning that they will match the maximal length thing while still allowing the rest of the regex to match. In this case, I "toned down" the \S+ pattern to match one and only one \S char by \S{1}. This could perhaps be \w{1} or \[a-zA-Z]{1}. Not sure what all variations that you are looking for.

    Update: "\" was not needed for [a-zA-Z] (thanks! AnomalousMonk for spotting this typo!). The Perl \w,\s,\d or capitalized versions are so powerful that I seldom use a character set ..!

    my $string_old = "<*2>H<2:0>,I<3:0>,<*2>P<4:0:2>"; my $string = "<*2>P<4:0:2>"; if(my@matches = ($string =~ /^<\*(\d+)>(\S{1})<(\d+):(\d+):(\d+)>$/)) { print "Matched ",join(" ", @matches),"\n"; } #printouts # orginal regex matched: # Matched 2 H<2:0>,I<3:0>,<*2>P 4 0 2 #new regex matches (old one did this too) #Matched 2 P 4 0 2
    Update: I thought I'd add that this could have been just \S instead of \S{1}, but I wanted to show the general pattern for say \S{2} or whatever. \S{1,3} would be either 1,2 or 3 \S characters {min,max}.
      Awesome reply...thanks for the help!
      This is a nice trick on how to print what is being matched.
Re: why does this match still?
by shmem (Chancellor) on Jun 03, 2009 at 22:52 UTC

    The pattern (\S+) means "capture one ore more non-whitespace chars", so that captures H<2:0>,I<3:0>,<*2>P - you might want e.g.

    if ($string =~ /^<\*(\d+)>([,\w]+)<(\d+):(\d+):(\d+)>$/) { print "Matched\n"; }

    But without more context, it's not possible to determine the most suitable regexp.