I'd read Dominus' Perl Regular Expression Mastery slides and in particular slide 51. (it may have changed by the time you read this) I wanted to write this for two reasons: to describe some output from re'debug' so people use it more and to note that the issue implied by the slide is not an issue anymore (before perl 5.6.1). But then I also think people should look at the output of re'debug' and -MO=Concise more often.

The slide indicates that for the regex X*, if X is non trivial then the * is implemented using a curly like {m,n}. In fact, it is done with {1,32767}. This leads to the next problem - wouldn't this mean that * is limited to 32767 maximum matches? Yes, except that now 32767 is special and it means 'infinity'. So the highest number you can type in a curly quantifier is 32766. The effect is that while the two expressions compile nearly identical, just that difference in the quantifier makes all the difference. Boring? Eh, maybe. It was interesting to me anyway and since I'd gone to some trouble, I thought I'd share.

These are simple expressions so you should be able to follow the nodes through. ^ is MBOL/BOL, $ is MEOL/EOL, there's the EXACT "\n" and the (?: ... )+ which could have been a PLUS, CURLYN, CURLYM, or CURLYX. Take the /m off of the regex and see how that changes the MBOL and MEOL nodes. Interesting stuff.

$ perl -le '("\n"x64_000)=~/\A(?:${\q[^$\n]}){1,32766}/m;print length +$&' 32766 Compiling REx `\A(?:^$\n){1,32766}' size 10 first at 2 1: SBOL(2) 2: CURLYM[0] {1,32766}(10) 4: MBOL(5) 5: MEOL(6) 6: EXACT < >(8) 8: SUCCEED(0) 9: NOTHING(10) 10: END(0) anchored ` ' at 0 (checking anchored noscan) anchored(SBOL) minlen 1 $ perl -le '("\n"x64_000)=~/\A(?:${\q[^$\n]})+/m;print length $&' 64000 Compiling REx `\A(?:^$\n)+' size 10 first at 2 1: SBOL(2) 2: CURLYM[0] {1,32767}(10) REG_INFTY----------^ 4: MBOL(5) 5: MEOL(6) 6: EXACT < >(8) 8: SUCCEED(0) 9: NOTHING(10) 10: END(0) anchored ` ' at 0 (checking anchored noscan) anchored(SBOL) minlen 1

In reply to re 'debug' misleads with curlym/curlyx (a minor thing) by diotalevi

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.