in reply to Regexes eating too much RAM
In general, the recipe is to eliminate all capture groups that operate on your large string. Beyond that, you can try to cut things off as you process them. Perl keeps a marker about where a string begins so if you're contientious, you can convince perl to just advance that pointer.
This is wasteful. When it matches, it makes a copy of $_ to an internal buffer so $1 can refer back to it. Eliminate the capturing parentheses and use substr() with @- and @+ to refer back to what $1 would have contained. The documentation for @- is a good reference for you right now.
You'll notice how I used 4-arg substr to directly replace the first part of the string.
if (/\G<([^<>]*)>/gc) { flush_name(); $state = 'TEXT'; print OUT $1; next; }
Efficient.
if (/\G<[^<>]*>/gc) { flush_name(); $state = 'TEXT'; print OUT substr $_, $-[0] + 1, $+[0] - $-[0] - 1; substr $_, 0, $+[0], ''; next; }
⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊
|
|---|