Why is while(m//g){} so much slower than s///eg and what can I do about it?

kaatunut has asked for the wisdom of the Perl Monks concerning the following question:

I won't go into details here, but I want to iterate through every occurence of certain regexp in string and do mutilate it in various ways. Now, as long as the mutilation is only limited to area of matched regexp, I can just do

$hop =~ s/(r(eg)e(xp)/mutilate($1,$2)/eg
[download]

or something like that. Cool. But s///eg runs out of shots pretty quickly: what if I want to mutilate areas before or or after the match, for example, to kill areas between two matches? What if I want the mutilated result to go through s///eg mangle too? So, I figure, I'll go like:

while ($hop =~ m/r(eg)e(xp)/g) {
    // handle stuff
    // to replace, substr($hop,length $`,length $&) = $replace; and th
+ink about slowness of $`, $& and all
    // to rewind, pos($hop)=$location;
}
[download]

Now, I don't think we need do much benchmarks to see that the second version is slow as hell. What can I do about that? I realize $`, $& and calculating their lengths is incredibly slow, and I realize s/// has been optimized to do just that, but that doesn't help me a bit. Is there a way for s/// RHS to touch the output-string-in-build (I don't see any problems that could make)? Is there a way to know the location of LHS match in s/// RHS (again, I bet s/// already has this internally, didn't they just bother telling me?)? Any other suggestions?

P.S. Yes, this is related to my last Seek of Perl Wisdom. tilly, the approach you suggested I interpreted as way 2. If you didn't mean that, elaborate.

Comment on Why is while(m//g){} so much slower than s///eg and what can I do about it? Select or Download Code

Replies are listed 'Best First'.
Re: Why is while(m//g){} so much slower than s///eg and what can I do about it? by japhy (Canon) on Nov 18, 2000 at 21:52 UTC
Perl 5.6 offers the `@-` and `@+` arrays. The 'perlvar' documentation. You can use `$prematch = substr($string, 0, $-[0]); $match = substr($string, $-[0], $+[0] - $-[0]); $postmatch = substr($string, $+[0]); $paren_1 = substr($string, $-[1], $+[1] - $-[1]); # ...` [download] This may help you in your efforts. `japhy` -- Perl and Regex Hacker	[reply] [d/l]
Re (tilly) 1: Why is while(m//g){} so much slower than s///eg and what can I do about it? by tilly (Archbishop) on Nov 18, 2000 at 22:16 UTC
I assume that you are referring to Metatag processing (overlapping regions)? Now first of all I would take what was the general first recommendation quite seriously - solve this in multiple passes. I would also simplify your logic - do you really need the nested logic? But if you want to proceed with the second, I would avoid ever using $` etc. If you look closely I did that with Why I like functional programming which I pointed you at before. Using those special variables slows down all REs, and you are going to use a lot of them. Instead I would arrange (as I did) to pass through the string caching everything just once. Even so a single pass with s///; is going to be much faster than a single pass with this more sophisticated algorithm for a whole ton of reasons. For instance it is looping in C, you are in Perl, so there is a factor of 10 difference right there. (Which is one of the reasons that so many suggested that you go with KISS until you don't have a choice.) But when the logic goes beyond what multiple passes can readily handle, or if you are making enough passes that don't do much, then the performance difference will reverse. This is often true. Do something sophisticated and you take an immediate performance hit - but then scale to more complex logic better.	[reply]