A single character can make a huge difference in how large of a match your regex is able to make (it likely also makes your regex more efficient):

use re 'debug'; my $braces = qr/(?<braces>\{ ([^\{\}]* | (?&braces))*? \} )/x; # ^ Compiling REx "(?<braces>\{ ([^\{\}]* | (?&braces))*? \} )" Final program: 1: OPEN1 'braces' (3) 3: EXACT <{> (5) 5: MINMOD (6) 6: CURLYX[0] {0,32767} (30) 8: OPEN2 (10) 10: BRANCH (23) 11: STAR (27) 12: ANYOF[\0-z|~-\377][{unicode_all}] (0) 23: BRANCH (FAIL) 24: GOSUB1[-23] (27) 27: CLOSE2 (29) 29: WHILEM[2/1] (0) 30: NOTHING (31) 31: EXACT <}> (33) 33: CLOSE1 'braces' (35) 35: END (0) anchored "{" at 0 floating "}" at 1..2147483647 (checking floating) mi +nlen 2 Freeing REx: "(?<braces>\{ ([^\{\}]* | (?&braces))*? \} )"

You still have an instance of {0,32767}, but it means you can match 32K instances of braces that are separated by non-brace runs where each run can be huge (note how the new * was translated to STAR w/o 32K mentioned and not to CURLYX).

Your prior regex would have to recurse into the 'braces' construct for every single new character matched. Since it would only do that upto 32K times, you could only match about 32KB of text.

Update: I should have added a + not a *.

- tye        


In reply to Re^2: Perl regex limitations (*) by tye
in thread Perl regex limitations by jonneve

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.