in reply to Re^2: Splitting on escapable delimiter
in thread Splitting on escapable delimiter

*reads documentation for split*

Ah, I need to add a -1 as a third parameter to split. Good spot.

Replies are listed 'Best First'.
Re^4: Splitting on escapable delimiter
by mobiusinversion (Beadle) on Mar 28, 2008 at 23:40 UTC
    Wow, I totally should have seen that!

    Okay so now that mutual correctness has been established, it is time for optimality checking. It turns out that your method is about 10% faster.

    I called my method unroll, and your method rollahead.

    Here are the benchtests:

    Benchmark: timing 100000 iterations of rollahead, unroll...
    rollahead: 18.3956 wallclock secs (17.94 usr + 0.00 sys = 17.94 CPU) @ 5575.07/s (n=100000)
    unroll: 22.1357 wallclock secs (20.58 usr + 0.00 sys = 20.58 CPU) @ 4859.56/s (n=100000)
                             Rate unroll rollahead
    unroll           4860/s       --          -13%
    rollahead     5575/s    15%            --


    and the code:
    use strict; use Benchmark ':all', ':hireswallclock'; my $x = "#@##@###@####@#####@"; my $y = reverse $x; my $z = "$x$x$x$y$y$x$x$y$y$y$y$y$x$x$x$x"; my $r = timethese( 100000, { unroll => sub { my @x = ([unroll($x)],[unroll($y)],[unroll($z)]) }, rollahead => sub { my @x = ([rollahead($x)],[rollahead($y)],[rollahead($z)]) }, } ); cmpthese($r); sub unroll { my @x = $_[0] =~ /(?:^|@)((?:##|#@|[^#@])*)/g; for(@x){ $_ =~ s/##/#/g; $_ =~ s/#@/@/g; } @x } sub rollahead { my $x = shift; $x = reverse $x; my @x = reverse(split/\@(?=(?:##)*(?!#))/,$x,-1); for(@x){ $_ = reverse; $_ =~ s/##/#/g; $_ =~ s/#@/@/g; } @x }
    Is there a monk who could explain why rollahead is faster? I was surprised considering the number of calls to reverse. I can only guess that somewhere deep inside the guts of the Perl-Regex-Beasty, that the optimizer droids are nasty hardcore with lookaround automata but can't be bothered with alternation.
      I get
      Benchmark: timing 100000 iterations of rollahead, unroll, unroll2... rollahead: 22.9776 wallclock secs (22.93 usr + 0.02 sys = 22.95 CPU) + @ 4357.30/s (n=100000) unroll: 23.0715 wallclock secs (23.03 usr + 0.02 sys = 23.05 CPU) + @ 4338.39/s (n=100000) unroll2: 22.3888 wallclock secs (22.35 usr + 0.03 sys = 22.38 CPU) + @ 4468.28/s (n=100000) Rate unroll rollahead unroll2 unroll 4338/s -- -0% -3% rollahead 4357/s 0% -- -2% unroll2 4468/s 3% 3% --
      which still rates rollahead as faster than unroll, but barely. Interesting that runs on different machines can produce a 15% change in relative performance. (I am running perl 5.8.6.)

      unroll2 is unroll with a more efficient regexp (one fewer alternation). That it is slightly faster suggests that the alternation is indeed slowing things down.

      sub unroll2 { my @x = $_[0] =~ /(?:^|@)((?:#[#@]|[^#@])*)/g; for(@x){ $_ =~ s/##/#/g; $_ =~ s/#@/@/g; } @x }
        Entirely excellent. Wow, I have enjoyed this thread very much. Thank you.