Re: Challenge: CPU-optimized byte-wise or-equals (for a meter of beer)

Be aware that perl version and platform may matter.

I have a couple of solutions derived from the work of moritz and avar that show some inconsistency.

Update: The specific solutions below are incorrect, as dragonchild and ikegami have pointed out to me. The advice about times varying still applies.

Windows XP with Strawberry Perl 5.8.8 (Athlon XP 2400+, 1 GB RAM) are faster by about 50% than avar's avar2_pos_inplace for me consistently.
Mandriva Linux (x86, Athlon 1000, 512 MB RAM) with perl 5.8.7 they are consistently slower by 10-15%.
Using ActiveState 5.8.0 (build 804) on the same code as run against Strawberry shows avar2_pos_inplace beating my solutions by 5-20% which is a much wider range than the Linux perl.
v5.8.6 built for cygwin-thread-multi-64int on the XP 2400+ is showing mrm_1 and avar2_pos_inplace in a dead heat, swapping places back and forth between runs.

In case anyone else wants to try my slight changes:

sub mrm_1 {
    my ( $s1, $s2 ) = @_;
    use bytes;
    my $pos = 0;
    while (-1 < ( $pos = index $$s1, '\0', $pos ) ) {
        substr( $$s1, $pos, 1 ) = substr( $s2, $pos, 1 );
    }
}

sub mrm_3 {
    my ( $s1, $s2 ) = @_;
    use bytes;

    my @zeros = ();
    my $pos = 0;
    while ( -1 < ( $pos = index $$s1, '\0', $pos ) ) {
        push @zeros, $pos;
    }
    for ( @zeros ) {
        substr( $$s1, $_, 1 ) = substr( $s2, $_, 1 );
    }
}
[download]

Interestingly, building the extra loop of indexes in mrm_3 is within the margin of benchmarking error at least on some perls. Sometimes it beats mrm_1 and sometimes it doesn't. At least once on my Linux box they tied (exactly 1393/s each). mrm_2 is about mid-way through the results, so I didn't bother to show it.

Perhaps more interesting is that Corion's code works for me most of the time, but sometimes tries to perform a substr() outside the string. I haven't tried to figure out exactly why. That goes to show the debugging price for being too clever.

Update: fixed the index() error that demerphq points out above. The results seem to be coming up about the same. I'm guessing the chances of the test data have '\0' at index 0 hadn't bit very hard yet.

Update 2: changed a comparison of the ranges between ActiveState's perl and perl on the Linux machine to state so. It did refer incorrectly to Strawberry.

Comment on Re: Challenge: CPU-optimized byte-wise or-equals (for a meter of beer) Download Code

Replies are listed 'Best First'.
Re^2: Challenge: CPU-optimized byte-wise or-equals (for a meter of beer) by mr_mischief (Monsignor) on Sep 13, 2007 at 02:19 UTC
Okay, I already updated that a couple of times for corrections, and this is new data, so I'm replying to myself. Update: As the above node, the implementations are broken, but pass the tests where they should not have. The theme of the node, that different perl builds are doing drastically different things with the same code, stands. I downloaded, compiled, and installed 5.9.5 on my Linux box. I also have a few more tweaks I've tried. Here's some result summaries (the Linux box with 5.9.5 -- the first test listed -- is 30 seconds. The rest are still 2): mrm_3, mrm_4, mrm_5, mrm_1, avar2_pos_inplace, and moritz are the tops in 5.9.5 on my 1Ghz, 512MB RAM Athlon with Mandriva 2006 community edition, in that order. They're only separated by 2%, and I ran this test at cmpthese(-30,...) instead of -2 for extra reliability. Strawberry 5.8.8 has them as mrm_1, mrm_2, mrm_4, mrm_4, avar2_pos_inplace, and moritz. AS 5.8.0 has avar2_pos_inplace, mrm_3, mrm_4, mrm_1, mrm_5, and moritz. It shows avar2_pos_inplace ahead by 5-20% the following place still. cygperl 5.8.6 still shows avar2_pos_inplace in a dead heat with several of the mrm_ solutions. The top five change order on nearly every run. moritz's solution comes in sixth reliably. perl 5.8.7 on the Linux box shows avar2_pos_inplace, mrm_1, mrm_4, mrm_5, mrm_3, then moritz. avar2_pos_inplace varies its lead from 4% to about 14% over mrm_1. I should note that moritz's solution is between 50% and 75% slower than the top pure-Perl solution in all of these tests, and the rest of the ones I've tested fall below that. I should also note that my Linux 5.8.7 does nearly twice as many iterations per second of every solution (of those faster than about 200 iterations per second anyway) than my 5.9.5 does, so I'm curious as to whether that's a development version thing or if my new perl just isn't compiled with as much optimization as the one that came with the distro. Switching to -O4 from -O2 for optimization and replacing some older x86-family lib references in the makefiles and rebuilding doesn't help much. I'm guessing the devel branch just isn't tuned at the source level as much as the stable branch, which makes sense. Here's my code for mrm_4 and mrm_5: `sub mrm_4 { # from [bart]'s vec() my ($s1, $s2) = @_; use bytes; my $pos = 0; while ( -1 < ( $pos = index $$s1, '\0', $pos ) ) { vec( $$s1, $pos, 8 ) \|\|= vec( $s2, $pos, 8 ); } } sub mrm_5 { # from moritz's, seeing if four-arg substr() is # faster or slower than lvalue substr() my ( $s1, $s2 ) = @_; use bytes; my $pos = 0; while ( -1 < ( $pos = index $$s1, '\0', $pos ) ) { substr( $$s1, $pos, 1, substr( $s2, $pos, 1 ) ); } }` [download]	[reply] [d/l]

Replies are listed 'Best First'.

Re^2: Challenge: CPU-optimized byte-wise or-equals (for a meter of beer)
by mr_mischief (Monsignor) on Sep 13, 2007 at 02:19 UTC

Update: As the above node, the implementations are broken, but pass the tests where they should not have. The theme of the node, that different perl builds are doing drastically different things with the same code, stands.

I downloaded, compiled, and installed 5.9.5 on my Linux box. I also have a few more tweaks I've tried. Here's some result summaries (the Linux box with 5.9.5 -- the first test listed -- is 30 seconds. The rest are still 2):

mrm_3, mrm_4, mrm_5, mrm_1, avar2_pos_inplace, and moritz are the tops in 5.9.5 on my 1Ghz, 512MB RAM Athlon with Mandriva 2006 community edition, in that order. They're only separated by 2%, and I ran this test at cmpthese(-30,...) instead of -2 for extra reliability.
Strawberry 5.8.8 has them as mrm_1, mrm_2, mrm_4, mrm_4, avar2_pos_inplace, and moritz.
AS 5.8.0 has avar2_pos_inplace, mrm_3, mrm_4, mrm_1, mrm_5, and moritz. It shows avar2_pos_inplace ahead by 5-20% the following place still.
cygperl 5.8.6 still shows avar2_pos_inplace in a dead heat with several of the mrm_ solutions. The top five change order on nearly every run. moritz's solution comes in sixth reliably.
perl 5.8.7 on the Linux box shows avar2_pos_inplace, mrm_1, mrm_4, mrm_5, mrm_3, then moritz. avar2_pos_inplace varies its lead from 4% to about 14% over mrm_1.

I should note that moritz's solution is between 50% and 75% slower than the top pure-Perl solution in all of these tests, and the rest of the ones I've tested fall below that.

I should also note that my Linux 5.8.7 does nearly twice as many iterations per second of every solution (of those faster than about 200 iterations per second anyway) than my 5.9.5 does, so I'm curious as to whether that's a development version thing or if my new perl just isn't compiled with as much optimization as the one that came with the distro. Switching to -O4 from -O2 for optimization and replacing some older x86-family lib references in the makefiles and rebuilding doesn't help much. I'm guessing the devel branch just isn't tuned at the source level as much as the stable branch, which makes sense.

Here's my code for mrm_4 and mrm_5:

sub mrm_4 {
    # from [bart]'s vec()
    my ($s1, $s2) = @_;
    use bytes;

    my $pos = 0;
    while ( -1 < ( $pos = index $$s1, '\0', $pos ) ) {
        vec( $$s1, $pos, 8 ) ||= vec( $s2, $pos, 8 );
    }
}

sub mrm_5 {
    # from moritz's, seeing if four-arg substr() is
    # faster or slower than lvalue substr()
    my ( $s1, $s2 ) = @_;
    use bytes;
    my $pos = 0;
    while ( -1 < ( $pos = index $$s1, '\0', $pos ) ) {
        substr( $$s1, $pos, 1, substr( $s2, $pos, 1 ) );
    }
}
[download]

[reply]
[d/l]