in reply to Re: Re^2: Replace zero-width grouping?
in thread Replace zero-width grouping?

It's the localization of the glob that somehow messes things up (actually, it appears to happen when Perl tries to restore it upon exiting the block). It doesn't really matter though. The principle is exactly the same as your "dio3" code, just more convoluted.

My intuition says the crucial bottleneck is the code assertion in the pattern, rather than the math. "buk3" beats "dio3" mostly, and the former certainly does more math than the latter.

One thing that occured to me is that when you have 4 substitutions in a row, you can skip the next 4 characters, since they'll all have been replaced. I can't think of a simple way to account for this fact in code though. Without a working single pattern solution it would require a nested loop, which is most probably not worth the effort. And if we had a working single pattern solution, there'd not probably be any manual optimization that could beat just letting the regex engine do its job.

Update: thinking about this gave me an idea.

#!/usr/bin/perl -wl use strict; my @ari; my $str = "17341234173412341734123417341234"; $ari[2] = $str; substr($ari[2], $_*4, 8) =~ s[(.)(?=...\1)|(?<=(.)...)\2]{ defined $1 ? "A" : "B" }eg for 0 .. length($ari[2])/4; print $ari[2]; __END__ A7AAB2BBA7AAB2BBA7AAB2BBA7AAB2BB

(Strange variable names left intact for easy addition to your benchmark.) It is consistently 4% faster than "buk3" on my machine.

All that said and done, unless I had a real performance problem with this task, I'd use diotalevi's initial simpleminded approach in production. It works correctly and is far easier to read than any of the alternatives.

Makeshifts last the longest.

Replies are listed 'Best First'.
Re: Re^4: Replace zero-width grouping?
by BrowserUk (Patriarch) on May 09, 2003 at 07:24 UTC

    Nice one++. I like the step back you took. The OP's original 2 pass idea applied to a limited range that avoids its problem and Whamo! Fewer, bigger chunks and more of the work done by the regex engine. Neat. And In most of the variations I tried it was 30% to 36% quicker than the next fastest. The only time it looses out is on a very sparse string, but that's inevitable.

    As I believe was once said to Oscar Wilde, "I wish I'd thought of that" :)

    Your right on the code block assertion too, -- I wondered what the right term for that was -- it is the biggest hit on performance of those solutions.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller

      I'm not sure "code block assertion" is the proper name, actually. (Maybe the Camel has something to say on the matter; perlre doesn't.) I just made up a term that made clear what I was talking about.

      Re "wish I'd thought of that": :) Remember the strategy for optimizing Perl code is to keep the execution of as much of an algorithm's logic as possible in the perl binary. The GRT is an impressive demonstration of this principle.

      Makeshifts last the longest.

        "Assertion" is the wrong word here because unless you do gymnastics the code block has no affect on whether the expression succeeds or fails. On the rare occasion that I want to use perl code in an assertion (and this never happens for "real" code) you have to use the eval block in a conditional and then use zero-width assertions to simulate a true-false.

        /(? # Use the conditional construct (?{ perl code goes here}) # Perl code that will assert something (?=) # empty positive assertion. | (?!) # empty negative assertion )/x

        History has it that Mr Wilde's response was, "You will Harvey, you will.". So, in this case, I guess I should simply say. "I will" :)

        In C  assert(true); could still be classed as an "assertion". The fact that the assertion is always true doesn't change that. perlre says

        This zero-width assertion evaluate any embedded Perl code. It always s +ucceeds,

        so 'code block assertion' as a phrase to describe (?{ ... }) makes a certain amount of sense, to me at least.

        And the (??{ ... }) is described as a "postponed regular subexpression".

        Both are a bit wordy, but it would be nice to have terms for them, rather than needing to constantly use the notation? Maybe CBA and PP-RE?? Just a thought.


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller