in reply to Re^4: Replace zero-width grouping? (faster!)
in thread Replace zero-width grouping?

Nice one++. I like the step back you took. The OP's original 2 pass idea applied to a limited range that avoids its problem and Whamo! Fewer, bigger chunks and more of the work done by the regex engine. Neat. And In most of the variations I tried it was 30% to 36% quicker than the next fastest. The only time it looses out is on a very sparse string, but that's inevitable.

D:\Perl\test>256024 -N=-1 -LEN=10 -SPARSE=10 -CHECK Rate dio1 dio2 buk1 buk2 ari2 enlil dio3 buk3 dio1 35.6/s -- -39% -62% -62% -64% -80% -81% -82% dio2 58.2/s 63% -- -38% -38% -41% -67% -69% -71% buk1 93.2/s 162% 60% -- -0% -5% -48% -50% -54% buk2 93.2/s 162% 60% 0% -- -5% -48% -50% -54% ari2 97.9/s 175% 68% 5% 5% -- -45% -48% -51% enlil 178/s 400% 206% 91% 91% 82% -- -5% -11% dio3 188/s 427% 222% 101% 101% 92% 5% -- -7% buk3 201/s 464% 245% 115% 115% 105% 13% 7% -- Ari1:Test not performed Ari2:A7AAB2BB....____....____....____....____....____....____. buk1:a7aab2bb....____....____....____....____....____....____. buk2:a7aab2bb....____....____....____....____....____....____. buk3:a7aab2bb....____....____....____....____....____....____. dio1:a7aab2bb....____....____....____....____....____....____. dio2:a7aab2bb....____....____....____....____....____....____. dio3:a7aab2bb....____....____....____....____....____....____. enll:a7aab2bb....____....____....____....____....____....____. D:\Perl\test>256024 -N=-1 -LEN=10 Rate dio2 dio1 enlil dio3 buk3 buk2 buk1 ari2 dio2 219/s -- -5% -25% -25% -26% -38% -39% -54% dio1 230/s 5% -- -21% -21% -22% -35% -35% -51% enlil 290/s 32% 26% -- -1% -2% -18% -19% -39% dio3 292/s 33% 27% 1% -- -1% -18% -18% -38% buk3 296/s 35% 29% 2% 1% -- -17% -17% -37% buk2 355/s 62% 54% 22% 22% 20% -- -0% -25% buk1 356/s 63% 55% 23% 22% 20% 0% -- -25% ari2 473/s 116% 105% 63% 62% 60% 33% 33% --

As I believe was once said to Oscar Wilde, "I wish I'd thought of that" :)

Your right on the code block assertion too, -- I wondered what the right term for that was -- it is the biggest hit on performance of those solutions.


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller

Replies are listed 'Best First'.
Re^6: Replace zero-width grouping? (optimizing perl code)
by Aristotle (Chancellor) on May 09, 2003 at 09:55 UTC

    I'm not sure "code block assertion" is the proper name, actually. (Maybe the Camel has something to say on the matter; perlre doesn't.) I just made up a term that made clear what I was talking about.

    Re "wish I'd thought of that": :) Remember the strategy for optimizing Perl code is to keep the execution of as much of an algorithm's logic as possible in the perl binary. The GRT is an impressive demonstration of this principle.

    Makeshifts last the longest.

      "Assertion" is the wrong word here because unless you do gymnastics the code block has no affect on whether the expression succeeds or fails. On the rare occasion that I want to use perl code in an assertion (and this never happens for "real" code) you have to use the eval block in a conditional and then use zero-width assertions to simulate a true-false.

      /(? # Use the conditional construct (?{ perl code goes here}) # Perl code that will assert something (?=) # empty positive assertion. | (?!) # empty negative assertion )/x
        Why don't you use (??{ }) for that?

        Makeshifts last the longest.

      History has it that Mr Wilde's response was, "You will Harvey, you will.". So, in this case, I guess I should simply say. "I will" :)

      In C  assert(true); could still be classed as an "assertion". The fact that the assertion is always true doesn't change that. perlre says

      This zero-width assertion evaluate any embedded Perl code. It always s +ucceeds,

      so 'code block assertion' as a phrase to describe (?{ ... }) makes a certain amount of sense, to me at least.

      And the (??{ ... }) is described as a "postponed regular subexpression".

      Both are a bit wordy, but it would be nice to have terms for them, rather than needing to constantly use the notation? Maybe CBA and PP-RE?? Just a thought.


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller

        FWIW, I tend to think of them as "RE eval" and "deferred eval", influenced in part by:

        perl -wle 'sub a { print +(caller)[1] } ""=~/(?{ a() })/'

        Hugo