I could think of various reasons:
- recursive patterns are not optimized for going deep
- but rather meant for parsing structures like code or data with at best hundreds of nesting levels
- like with subs every recursion means storing a certain amount of data on the stack
- branching doesn't happen that often when parsing a grammar, bc there is only a small amount of opening "brackets" to follow deeper
- "normal" regexes are optimized by heuristics to take shortcuts, i.e. .* is not naively implemented going step by step
My suggestion would be that you
use re 'debug' to see whats happening.
Especially I'd try to make sure that it's really the same amount of backtracking in both cases
update
FWIW: my OS is killing the process when I attempt to have 1e7 recursions.
DB<3> ("a"x 1e6) =~ / (. (?: (?1) | ) ) /x; say length $1
1000000
DB<4> ("a"x 1e7) =~ /(.*)/; say length $1
10000000
DB<5> ("a"x 1e7) =~ / (. (?: (?1) | ) ) /x; say length $1
Killed
My guess: memory problems on the stack.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.