I'd read Dominus' Perl Regular Expression Mastery slides and in particular slide 51. (it may have changed by the time you read this) I wanted to write this for two reasons: to describe some output from re'debug' so people use it more and to note that the issue implied by the slide is not an issue anymore (before perl 5.6.1). But then I also think people should look at the output of re'debug' and -MO=Concise more often.
The slide indicates that for the regex X*, if X is non trivial then the * is implemented using a curly like {m,n}. In fact, it is done with {1,32767}. This leads to the next problem - wouldn't this mean that * is limited to 32767 maximum matches? Yes, except that now 32767 is special and it means 'infinity'. So the highest number you can type in a curly quantifier is 32766. The effect is that while the two expressions compile nearly identical, just that difference in the quantifier makes all the difference. Boring? Eh, maybe. It was interesting to me anyway and since I'd gone to some trouble, I thought I'd share.
These are simple expressions so you should be able to follow the nodes through. ^ is MBOL/BOL, $ is MEOL/EOL, there's the EXACT "\n" and the (?: ... )+ which could have been a PLUS, CURLYN, CURLYM, or CURLYX. Take the /m off of the regex and see how that changes the MBOL and MEOL nodes. Interesting stuff.
$ perl -le '("\n"x64_000)=~/\A(?:${\q[^$\n]}){1,32766}/m;print length +$&' 32766 Compiling REx `\A(?:^$\n){1,32766}' size 10 first at 2 1: SBOL(2) 2: CURLYM[0] {1,32766}(10) 4: MBOL(5) 5: MEOL(6) 6: EXACT < >(8) 8: SUCCEED(0) 9: NOTHING(10) 10: END(0) anchored ` ' at 0 (checking anchored noscan) anchored(SBOL) minlen 1 $ perl -le '("\n"x64_000)=~/\A(?:${\q[^$\n]})+/m;print length $&' 64000 Compiling REx `\A(?:^$\n)+' size 10 first at 2 1: SBOL(2) 2: CURLYM[0] {1,32767}(10) REG_INFTY----------^ 4: MBOL(5) 5: MEOL(6) 6: EXACT < >(8) 8: SUCCEED(0) 9: NOTHING(10) 10: END(0) anchored ` ' at 0 (checking anchored noscan) anchored(SBOL) minlen 1
In reply to re 'debug' misleads with curlym/curlyx (a minor thing) by diotalevi
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |