Strange regex-behaviour

strat has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks,

I've just got some code to enhance, and have found the following lines (shortened):

C:\work\perl\Net\ldap\replicationClient\backup>perl
my $line = "Time: 174657\n";
# line has to be in this format: xxx*:*  (xxx are letters,  * ser any 
+signs)
if ($line =~ /^([a-zA-Z_)[3,].*?):(.*)/) {
  print "$1: $2\n";
}
^Z
Time:  174657
[download]

I can't understand why this pattern matches $line, because a length can be given with {3,}, while [...] means alternatives.

Is [3,] under certain circumstances really the same as {3,}, or is perl really so clever?
I expected everything else than a match (included even Syntax errors because of the first [...)

Best regards,
perl -le "s==*F=e=>y~\*martinF~stronat~=>s~[^\w]~~g=>chop,print"

Comment on Strange regex-behaviour Download Code

Replies are listed 'Best First'.
Re: Strange regex-behaviour by broquaint (Abbot) on Mar 11, 2002 at 17:17 UTC
>I can't understand why this pattern matches $line Because it's non-greedily matching the character class `a-zA-Z_)[3,` so will match `Time` and stop at the `:`. > is perl really so clever? Not in this case, but generally yes ;-) HTH broquaint	[reply]
Re: Strange regex-behaviour by seattlejohn (Deacon) on Mar 11, 2002 at 17:23 UTC
You're basically matching by luck. I believe Perl is interpreting the regex as follows: `/^([character-class].?):(.)/ #where character-class == a-zA-Z_)[3,` [download] That is, the '[3,' is getting interpreted as part of the character class with its missing close bracket, not as some kind of modifier on the class. Of course this isn't as strict a matching pattern as you were expecting!	[reply] [d/l]
Re: Strange regex-behaviour by erikharrison (Deacon) on Mar 11, 2002 at 21:07 UTC
Perl is not really thinking `[3,]` is `{3,}` Let's step through the regex, and see why it matches. `/^ #Match at the beginning of the string ( #begin $1 [a-zA-Z_)[3,] #A character class .* #Any number of characters ? #Minimal match the .* ) #End $1 : #Match a colon ( #Begin $2 .* #Match until end of line )/x #End $2 and the match` [download] Alright, the large character class will match one letter that is upper or lower case, and underscore, an open paren, a open bracket, a 3, or a comma. The open bracket does not confuse Perl into thinking we have a new character class (This can be tested by replacing 'Time' with '[ime'). This huge class matches the upper case 'T' in $line. The dot star matches the 'ime' up until the colon. In this specific case, the minimal match construct doesn't do anything. (Test by deleting it). This all stores the word 'Time' in $1. The colon is matched and not saved. So we begin saving after the colon, and we just match until the end of the line - that's '174657', which gets stored in $2. Then we just print those things out. As for forming a regex which actually matches the format you want, I don't quite understand what that format is (your comment is unclear to me). Hope this helps. Cheers, Erik	[reply] [d/l] [select]
Re: Strange regex-behaviour by strat (Canon) on Mar 12, 2002 at 10:58 UTC
Thank you very much for your explications; now I understand the order in which the braces are solved. I'm just trying to enhance a script not written by me, and the only stuff I do in the moment is tracking some nice errors and the consequences if I fix them (does this man really want to say what he's saying??? sometimes yes, sometimes no :-)... This code was one of the most beautiful examples I found (description is totally different with code), and I got totally puzzled by the first ), tried this code, and it's output puzzled me more. But I will see if I can change it Best regards, perl -le "s==F=e=>y~\martinF~stronat~=>s~[^\w]~~g=>chop,print"	[reply]