Hello,
Are there limitations to the regex matching capabilities in Perl? I came across a case today where a string wouldn't match, and I couldn't figure out why. I finally tried removing part of the source file and it matched! So it seems as though the size of the source file and/or the size of the match is limited somehow. The match would have been about 50 kb.
The expression was the following :
s/class \s+ (?<class_name> \w+) \s*
(\: (\s* \w*)? \s* (?<ancestor> \w+))? \s*
$braces
//x
where $braces was defined as :
$braces = qr/(?<braces>\{ ([^\{\}] | (?&braces))*? \} )/x;
It matched correctly if I simplified it to the following:
s/class \s+ (?<class_name> \w+) \s*
(\: (\s* \w*)? \s* (?<ancestor> \w+))? \s*
\{ (?<class_body> [\s\S]+ ) \} \s*;
//xm
That wasn't acceptable however, because by removing the recursion, I was no longer able to match the braces correctly and it therefore failed if there were two sets of braces following each other (it greedily matched right up to the last closing brace.
So basically, my question is what the limitations are that caused this problem. I realize that I'm pushing things to the limit and that at this point, it's probably better, just from a performance point of view, to split things up into smaller expressions by doing a bit of basic parsing by hand first (that's what I ended up doing, and it works fine). I did waste a good 2 hours trying to figure out why the regex wasn't working though, so I'd kind of like to know for next time what the limitations are...
Thanks in advance,
Jonathan
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.