in reply to Re^3: Splitting on escapable delimiter
in thread Splitting on escapable delimiter

Wow, I totally should have seen that!

Okay so now that mutual correctness has been established, it is time for optimality checking. It turns out that your method is about 10% faster.

I called my method unroll, and your method rollahead.

Here are the benchtests:

Benchmark: timing 100000 iterations of rollahead, unroll...
rollahead: 18.3956 wallclock secs (17.94 usr + 0.00 sys = 17.94 CPU) @ 5575.07/s (n=100000)
unroll: 22.1357 wallclock secs (20.58 usr + 0.00 sys = 20.58 CPU) @ 4859.56/s (n=100000)
                         Rate unroll rollahead
unroll           4860/s       --          -13%
rollahead     5575/s    15%            --


and the code:
use strict; use Benchmark ':all', ':hireswallclock'; my $x = "#@##@###@####@#####@"; my $y = reverse $x; my $z = "$x$x$x$y$y$x$x$y$y$y$y$y$x$x$x$x"; my $r = timethese( 100000, { unroll => sub { my @x = ([unroll($x)],[unroll($y)],[unroll($z)]) }, rollahead => sub { my @x = ([rollahead($x)],[rollahead($y)],[rollahead($z)]) }, } ); cmpthese($r); sub unroll { my @x = $_[0] =~ /(?:^|@)((?:##|#@|[^#@])*)/g; for(@x){ $_ =~ s/##/#/g; $_ =~ s/#@/@/g; } @x } sub rollahead { my $x = shift; $x = reverse $x; my @x = reverse(split/\@(?=(?:##)*(?!#))/,$x,-1); for(@x){ $_ = reverse; $_ =~ s/##/#/g; $_ =~ s/#@/@/g; } @x }
Is there a monk who could explain why rollahead is faster? I was surprised considering the number of calls to reverse. I can only guess that somewhere deep inside the guts of the Perl-Regex-Beasty, that the optimizer droids are nasty hardcore with lookaround automata but can't be bothered with alternation.

Replies are listed 'Best First'.
Re^5: Splitting on escapable delimiter
by Anonymous Monk on Mar 29, 2008 at 01:30 UTC
    I get
    Benchmark: timing 100000 iterations of rollahead, unroll, unroll2... rollahead: 22.9776 wallclock secs (22.93 usr + 0.02 sys = 22.95 CPU) + @ 4357.30/s (n=100000) unroll: 23.0715 wallclock secs (23.03 usr + 0.02 sys = 23.05 CPU) + @ 4338.39/s (n=100000) unroll2: 22.3888 wallclock secs (22.35 usr + 0.03 sys = 22.38 CPU) + @ 4468.28/s (n=100000) Rate unroll rollahead unroll2 unroll 4338/s -- -0% -3% rollahead 4357/s 0% -- -2% unroll2 4468/s 3% 3% --
    which still rates rollahead as faster than unroll, but barely. Interesting that runs on different machines can produce a 15% change in relative performance. (I am running perl 5.8.6.)

    unroll2 is unroll with a more efficient regexp (one fewer alternation). That it is slightly faster suggests that the alternation is indeed slowing things down.

    sub unroll2 { my @x = $_[0] =~ /(?:^|@)((?:#[#@]|[^#@])*)/g; for(@x){ $_ =~ s/##/#/g; $_ =~ s/#@/@/g; } @x }
      Entirely excellent. Wow, I have enjoyed this thread very much. Thank you.