in reply to Re^5: Challenge: CPU-optimized byte-wise or-equals (for a meter of beer)
in thread Challenge: CPU-optimized byte-wise or-equals (for a meter of beer)

With correction "\x00" in place of the erroneous '\0' on Strawberry:
Rate ikegami_s ikegami_s 36.4/s -- avar 185/s 408% avar2 192/s 426% avar2_pos 324/s 788% ikegami_tr 329/s 804% moritz 786/s 2056% mrm_5 1561/s 4183% mrm_3 1578/s 4229% avar2_pos_inplace 1610/s 4318% mrm_4 1620/s 4344% mrm_1 1640/s 4400% mrm_6 1899/s 5111% ok 1 - mrm_1 gets some value ok 2 - mrm_4 gets same value ok 3 - avar2_pos_inplace gets same value ok 4 - moritz gets same value ok 5 - mrm_3 gets same value ok 6 - mrm_5 gets same value ok 7 - avar2_pos gets same value ok 8 - ikegami_tr gets same value ok 9 - mrm_6 gets same value ok 10 - ikegami_s gets same value ok 11 - avar gets same value ok 12 - avar2 gets same value
And this is with this code:
#!/usr/bin/perl use 5.6.0; use strict; use warnings FATAL => 'all'; use Benchmark qw( cmpthese ); my $s1 = do_rand(0, 100_000); my $s2 = do_rand(1, 100_000); my $subs = { # 'split1' => sub { my $s3 = split1( $s1, $s2 ) }, # 'substr1' => sub { my $s3 = substr1( $s1, $s2 ) }, # 'kyle' => sub { my $s3 = kyle( $s1, $s2 ) }, 'moritz' => sub { my $s3 = moritz( $s1, $s2 ) }, # 'corion' => sub { my $s3 = corion( $s1, $s2 ) }, 'ikegami_s' => sub { my $s3 = ikegami_s( $s1, $s2 ) }, 'ikegami_tr' => sub { my $s3 = ikegami_tr( $s1, $s2 ) }, 'avar' => sub { my $s3 = avar( $s1, $s2 ) }, 'avar2' => sub { my $s3 = avar2( $s1, $s2 ) }, 'avar2_pos' => sub { my $s3 = avar2_pos( $s1, $s2 ) }, 'avar2_pos_inplace' => sub { avar2_pos_inplace( \$s1, $s2 ); $s1 +}, 'mrm_1' => sub { mrm_1( \$s1, $s2 ); $s1 }, # 'mrm_2' => sub { mrm_2( \$s1, $s2 ); $s1 }, 'mrm_3' => sub { mrm_3( \$s1, $s2 ); $s1 }, 'mrm_4' => sub { mrm_4( \$s1, $s2 ); $s1 }, 'mrm_5' => sub { mrm_5( \$s1, $s2 ); $s1 }, 'mrm_6' => sub { mrm_6( \$s1, \$s2 ); $s1 }, }; cmpthese( -2, $subs ); use Test::More; plan 'tests' => scalar keys %{$subs}; my $s3; foreach my $subname ( keys %{$subs} ) { my $sub = $subs->{$subname}; if ( defined $s3 ) { is( $sub->(), $s3, "$subname gets same value" ); } else { $s3 = $sub->(); ok( defined $s3, "$subname gets some value" ); } } sub split1 { my ($s1, $s2) = @_; my @s1 = split //, $s1; my @s2 = split //, $s2; foreach my $idx ( 0 .. $#s1 ) { if ( $s1[$idx] eq chr(0) ) { $s1[$idx] = $s2[$idx]; } } return join '', @s1; } sub substr1 { my ($s1, $s2) = @_; for my $idx ( 0 .. length($s1) ) { if ( substr($s1,$idx,1) eq chr(0) ) { substr($s1, $idx, 1) = substr($s2, $idx, 1); } } return $s1; } sub kyle { my ($s1, $s2) = @_; my $out = $s1; while ( $s1 =~ m/\000/g ) { my $pos = pos; substr( $out, $pos, 1 ) = substr( $s2, $pos, 1 ); } return $out; } sub moritz { my ($s1, $s2) = @_; my $pos = 0; while ( -1 < ( $pos = index $s1, "\000", $pos ) ) { substr( $s1, $pos, 1 ) = substr( $s2, $pos, 1 ); } return $s1; } sub ikegami_s { my ($s1, $s2) = @_; (my $mask = $s1) =~ s/[^\x00]/\xFF/g; return ($s1 & $mask) | ($s2 & ~$mask); } sub ikegami_tr { my ($s1, $s2) = @_; (my $mask = $s1) =~ tr/\x00/\xFF/c; return ($s1 & $mask) | ($s2 & ~$mask); } sub corion { my ($s1, $s2) = @_; my $ofs = 0; return join "", map { $ofs += length; $_ => substr $s2, $ofs++, 1 } + split /\0/, $s1, -1; } sub avar { my ($s1, $s2) = @_; my $s3 = $s1; { use bytes; $s3 =~ s/(\0)/substr $s2, $+[0]-1, 1/eg; } $s3; } sub avar2 { my ($s1, $s2) = @_; use bytes; $s1 =~ s/(\0)/substr $s2, $+[0]-1, 1/eg; return $s1; } sub avar2_pos { my ($s1, $s2) = @_; use bytes; $s1 =~ s/\0/substr $s2, pos($s1), 1/eg; return $s1; } sub avar2_pos_inplace { my ($s1, $s2) = @_; use bytes; $$s1 =~ s/\0/substr $s2, pos($$s1), 1/eg; } sub mrm_1 { my ( $s1, $s2 ) = @_; # from [moritz]'s work use bytes; my $pos = 0; while ( -1 < ( $pos = index $$s1, "\x00", $pos ) ) { substr( $$s1, $pos, 1 ) = substr( $s2, $pos, 1 ); } } sub mrm_3 { my ( $s1, $s2 ) = @_; # from moritz's, builds a separate loop of zeros use bytes; my @zeros = (); my $pos = 0; while ( -1 < ( $pos = index $$s1, "\x00", $pos ) ) { push @zeros, $pos; } for ( @zeros ) { substr( $$s1, $_, 1 ) = substr( $s2, $_, 1 ); } } sub mrm_4 { # from [bart]'s vec() my ($s1, $s2) = @_; use bytes; my $pos = 0; while ( -1 < ( $pos = index $$s1, "\x00", $pos ) ) { vec( $$s1, $pos, 8 ) ||= vec( $s2, $pos, 8 ); } } sub mrm_5 { # from moritz's, seeing if four-arg substr() is faster or slower t +han lvalue substr() my ( $s1, $s2 ) = @_; use bytes; my $pos = 0; while ( -1 < ( $pos = index $$s1, "\x00", $pos ) ) { substr( $$s1, $pos, 1, substr( $s2, $pos, 1 ) ); } } sub mrm_6 { # from mrn_5, testing bytes::misc explicitly instead of importing # also in-place using of $s2 my ( $s1, $s2 ) = @_; use bytes (); my $pos = 0; while ( -1 < ( $pos = bytes::index( $$s1, "\x00", $pos ) ) ) { bytes::substr( $$s1, $pos, 1, bytes::substr( $$s2, $pos, 1 ) ) +; } } # This makes sure that $s1 has chr(0)'s in it and $s2 does not. sub do_rand { my $min = shift; my $len = shift; my $n = ""; for (1 .. $len) { $n .= chr( rand(255-$min)+$min ); } return $n; } #sub do_rand { # my $n = (shift) ? int(rand(255)) : int(rand(254)) + 1; # return chr( $n ); #} __END__

Replies are listed 'Best First'.
Re^7: Challenge: CPU-optimized byte-wise or-equals (for a meter of beer)
by ikegami (Patriarch) on Sep 14, 2007 at 02:05 UTC

    I fixed the in-place tests so they actually worked on the intended test data. And to make the benchmarks worth anything, I made inline those that weren't inline.

    By the way, mrm_3 looped forever if $s1 contained a NUL. I fixed it by adding a + 1 to index's last arg.

    1..16 ok 1 - corion gets some value ok 2 - substr1 gets same value ok 3 - ikegami4 gets same value ok 4 - mrm_1 gets same value ok 5 - mrm_4 gets same value ok 6 - bart gets same value ok 7 - mrm_3 gets same value ok 8 - split1 gets same value ok 9 - moritz gets same value ok 10 - mrm_5 gets same value ok 11 - avar2_pos gets same value ok 12 - ikegami1 gets same value ok 13 - ikegami2 gets same value ok 14 - ikegami3 gets same value ok 15 - mrm_6 gets same value ok 16 - avar2 gets same value done. Rate split1 substr1 ikegami1 corion avar2 mrm_6 avar2_pos + ikegami2 bart ikegami3 mrm_3 mrm_4 moritz ikegami4 mrm_1 mrm_5 split1 1.08/s -- -96% -97% -100% -100% -100% -100% + -100% -100% -100% -100% -100% -100% -100% -100% -100% substr1 28.0/s 2505% -- -22% -94% -94% -95% -96% + -97% -97% -97% -98% -98% -98% -99% -99% -99% ikegami1 36.1/s 3252% 29% -- -92% -92% -93% -95% + -96% -97% -97% -98% -98% -98% -98% -98% -99% corion 465/s 43105% 1558% 1189% -- -3% -12% -37% + -46% -56% -57% -70% -74% -75% -75% -78% -83% avar2 479/s 44448% 1610% 1229% 3% -- -9% -35% + -44% -55% -55% -69% -73% -74% -75% -78% -82% mrm_6 526/s 48782% 1776% 1358% 13% 10% -- -29% + -38% -50% -51% -66% -71% -71% -72% -76% -80% avar2_pos 742/s 68898% 2548% 1959% 60% 55% 41% -- + -13% -30% -31% -52% -59% -59% -61% -65% -72% ikegami2 855/s 79348% 2949% 2270% 84% 78% 63% 15% + -- -19% -20% -44% -53% -53% -55% -60% -68% bart 1054/s 97924% 3662% 2825% 127% 120% 101% 42% + 23% -- -1% -31% -42% -42% -44% -51% -61% ikegami3 1070/s 99384% 3718% 2868% 130% 123% 104% 44% + 25% 1% -- -30% -41% -41% -43% -50% -60% mrm_3 1532/s 142352% 5367% 4150% 230% 220% 191% 106% + 79% 45% 43% -- -15% -16% -19% -29% -43% mrm_4 1804/s 167619% 6337% 4904% 288% 276% 243% 143% + 111% 71% 69% 18% -- -1% -5% -16% -33% moritz 1822/s 169343% 6403% 4955% 292% 280% 247% 146% + 113% 73% 70% 19% 1% -- -4% -15% -32% ikegami4 1891/s 175685% 6647% 5144% 307% 295% 260% 155% + 121% 79% 77% 23% 5% 4% -- -12% -29% mrm_1 2150/s 199814% 7573% 5864% 363% 349% 309% 190% + 152% 104% 101% 40% 19% 18% 14% -- -20% mrm_5 2675/s 248608% 9446% 7320% 476% 458% 409% 260% + 213% 154% 150% 75% 48% 47% 41% 24% --
      Okay, now that tests and benchmarks are working (thanks, ikegami), I'll do my comparative runs and show the differences between the different perl builds on this.

      The fastest pure-Perl solution in every case seems to be mrm_5, and usually by a decent margin. That's a version of moritz's code slightly tweaked with some of the tips taken from avar's code. moritz's own code (as tweaked for in-place fairness by ikegami) seems to be near the top quite consistently. So if pure Perl speed is the goal, moritz seems to have been on the right track all along. As my code for mrm_5 is just variations on improving his, I think he should get most of the credit for it (especially since he didn't go stupid halfway through and break it like I did).

      Strawberry 5.8.8 on WinXP, AthlonXP 2400+, 1GB

      ActivePerl 5.8.0 on same machine Died during tests
      1..16 ok 1 - corion gets some value ok 2 - substr1 gets same value ok 3 - mrm_1 gets same value ok 4 - mrm_4 gets same value ok 5 - split1 gets same value ok 6 - moritz gets same value ok 7 - avar2_pos gets same value ok 8 - ikegami1 gets same value ok 9 - ikegami2 gets same value ok 10 - ikegami3 gets same value ok 11 - ikegami4 gets same value ok 12 - bart gets same value ok 13 - mrm_3 gets same value ok 14 - mrm_5 gets same value
      cygperl 5.8.7 on same machine perl 5.8.7 on Mandriva 2006, Athlon 1000, 512 MB perl 5.9.5 (gcc 4.0.1, -O4 (4.0.1-5mdk)) on the above Mandriva

      Just for kicks: miniperl 5.9.5 on Mandriva

Re^7: Challenge: CPU-optimized byte-wise or-equals (for a meter of beer)
by ikegami (Patriarch) on Sep 14, 2007 at 01:14 UTC
    Depending on the order in which the tests are run, between half and all of the tests are worthless because the NULs are removed from $s1 by the first run of the first in-place test.