Re^2: Perl regex limitations (*)

A single character can make a huge difference in how large of a match your regex is able to make (it likely also makes your regex more efficient):

use re 'debug';
my $braces = qr/(?<braces>\{ ([^\{\}]* | (?&braces))*? \} )/x;
#                                    ^

Compiling REx "(?<braces>\{ ([^\{\}]* | (?&braces))*? \} )"
Final program:
   1: OPEN1 'braces' (3)
   3:   EXACT <{> (5)
   5:   MINMOD (6)
   6:   CURLYX[0] {0,32767} (30)
   8:     OPEN2 (10)
  10:       BRANCH (23)
  11:         STAR (27)
  12:           ANYOF[\0-z|~-\377][{unicode_all}] (0)
  23:       BRANCH (FAIL)
  24:         GOSUB1[-23] (27)
  27:     CLOSE2 (29)
  29:   WHILEM[2/1] (0)
  30:   NOTHING (31)
  31:   EXACT <}> (33)
  33: CLOSE1 'braces' (35)
  35: END (0)
anchored "{" at 0 floating "}" at 1..2147483647 (checking floating) mi
+nlen 2 
Freeing REx: "(?<braces>\{ ([^\{\}]* | (?&braces))*? \} )"
[download]

You still have an instance of {0,32767}, but it means you can match 32K instances of braces that are separated by non-brace runs where each run can be huge (note how the new * was translated to STAR w/o 32K mentioned and not to CURLYX).

Your prior regex would have to recurse into the 'braces' construct for every single new character matched. Since it would only do that upto 32K times, you could only match about 32KB of text.

Update: I should have added a + not a *.

- tye

Comment on Re^2: Perl regex limitations (*) Select or Download Code