Re^4: Splitting on escapable delimiter

Wow, I totally should have seen that!

Okay so now that mutual correctness has been established, it is time for optimality checking. It turns out that your method is about 10% faster.

I called my method unroll, and your method rollahead.

Here are the benchtests:

Benchmark: timing 100000 iterations of rollahead, unroll...
rollahead: 18.3956 wallclock secs (17.94 usr + 0.00 sys = 17.94 CPU) @ 5575.07/s (n=100000)
unroll: 22.1357 wallclock secs (20.58 usr + 0.00 sys = 20.58 CPU) @ 4859.56/s (n=100000)
                         Rate unroll rollahead
unroll           4860/s       --          -13%
rollahead     5575/s    15%            --

and the code:

use strict;

use Benchmark ':all', ':hireswallclock';

my $x = "#@##@###@####@#####@";
my $y = reverse $x;
my $z = "$x$x$x$y$y$x$x$y$y$y$y$y$x$x$x$x";

my $r = timethese(
    100000,
    {
        unroll    => sub {
           my @x = ([unroll($x)],[unroll($y)],[unroll($z)])
        },
        rollahead => sub {
           my @x = ([rollahead($x)],[rollahead($y)],[rollahead($z)])
        },
    }
);
cmpthese($r);

sub unroll {
    my @x = $_[0] =~ /(?:^|@)((?:##|#@|[^#@])*)/g;
    for(@x){
        $_ =~ s/##/#/g;
        $_ =~ s/#@/@/g;
    } @x
}

sub rollahead {
    my $x = shift;
    $x = reverse $x;
    my @x = reverse(split/\@(?=(?:##)*(?!#))/,$x,-1);  
    for(@x){
        $_ = reverse;
        $_ =~ s/##/#/g;
        $_ =~ s/#@/@/g;
    } @x
}
[download]

Is there a monk who could explain why rollahead is faster? I was surprised considering the number of calls to reverse. I can only guess that somewhere deep inside the guts of the Perl-Regex-Beasty, that the optimizer droids are nasty hardcore with lookaround automata but can't be bothered with alternation.

Comment on Re^4: Splitting on escapable delimiter Download Code

Replies are listed 'Best First'.
Re^5: Splitting on escapable delimiter by Anonymous Monk on Mar 29, 2008 at 01:30 UTC
I get `Benchmark: timing 100000 iterations of rollahead, unroll, unroll2... rollahead: 22.9776 wallclock secs (22.93 usr + 0.02 sys = 22.95 CPU) + @ 4357.30/s (n=100000) unroll: 23.0715 wallclock secs (23.03 usr + 0.02 sys = 23.05 CPU) + @ 4338.39/s (n=100000) unroll2: 22.3888 wallclock secs (22.35 usr + 0.03 sys = 22.38 CPU) + @ 4468.28/s (n=100000) Rate unroll rollahead unroll2 unroll 4338/s -- -0% -3% rollahead 4357/s 0% -- -2% unroll2 4468/s 3% 3% --` [download] which still rates rollahead as faster than unroll, but barely. Interesting that runs on different machines can produce a 15% change in relative performance. (I am running perl 5.8.6.) unroll2 is unroll with a more efficient regexp (one fewer alternation). That it is slightly faster suggests that the alternation is indeed slowing things down. `sub unroll2 { my @x = $_[0] =~ /(?:^\|@)((?:#[#@]\|[^#@])*)/g; for(@x){ $_ =~ s/##/#/g; $_ =~ s/#@/@/g; } @x }` [download]	[reply] [d/l] [select]
Re^6: Splitting on escapable delimiter by mobiusinversion (Beadle) on Mar 29, 2008 at 07:14 UTC
Entirely excellent. Wow, I have enjoyed this thread very much. Thank you.	[reply]