in reply to Re^4: Challenge: CPU-optimized byte-wise or-equals (for a meter of beer)
in thread Challenge: CPU-optimized byte-wise or-equals (for a meter of beer)

'\0' is not the same as chr(0). By changing to chr(0), your code drops dramatically in the rankings.

My criteria for good software:
  1. Does it work?
  2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
  • Comment on Re^5: Challenge: CPU-optimized byte-wise or-equals (for a meter of beer)

Replies are listed 'Best First'.
Re^6: Challenge: CPU-optimized byte-wise or-equals (for a meter of beer)
by mr_mischief (Monsignor) on Sep 13, 2007 at 16:36 UTC
    Update: Everything in this node is based on a silly mistake and is irrelevant. The readmore has my original lunatic ramblings.

    dragonchild's statement above is correct. chr(0) is a touch less efficient than "\0" or just plain 0 it seems, but not nearly so much as to make any difference in the real world. Still, the tests pass with the wrong string. The version of the benchmark and test code I grabbed from an earlier node must've reinforced my blunder.

      chr(0) is just a byte with all 8 bits set to 0. It's considered the NUL byte in many languages because it's useful to do so.

      Tossing in bytes::chr() slowed your version down even more. :-(

      Update: Corrected per ikegami's response.


      My criteria for good software:
      1. Does it work?
      2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?

        NULL is a special pointer in C, and an exceptional value in in SQL and VB.
        NUL is ASCII character 0.

        I believe you are refereing to NUL.

        In response to your updated node:

        Look at an ASCII chart (2, 3, 4). What's at position 0? It's a byte with all zeros, and yes it's called NUL. It's also referred to as a 'null byte' (all lowercase, four letters). While it's true a null/zero byte can represent something other than NUL, NUL is always represented as a null/zero byte (at least in ASCII and EBCDIC).

        When dealing with eight-bit bytes, it shouldn't matter if you have an ASCII character, '\0', 0, "\x00", or "\0". vec(), ikegami's tr/\x00/\xFF/, and anything using use bytes; is working at the byte (or bit, in the case of vec()) level, and not necessarily working on "character" data. Update: it should matter if you have '\0'. It shouldn't matter about the rest. The test isn't failing, though.

        I don't need your beer. I'm just trying to help. You can do whatever you like, but I'm not sure where you're getting the idea that '\0' is producing a different end product in these cases. Try "\000" or "\x00" instead, and see if it changes anything at all. I'm guessing using chr(0) is changing absolutely nothing but the speed. Update: and I based this on the results of the Test::More tests that said it was all producing the same output. Apparently, either it's working by some fluke, or the tests are broken.

        Take a look at this:

        Rate split1 mrm_7 mrm_8 mrm_6 split1 1.08/s -- -100% -100% -100% mrm_7 1906/s 176180% -- -0% -44% mrm_8 1910/s 176481% 0% -- -44% mrm_6 3381/s 312486% 77% 77% -- 1..4 ok 1 - split1 gets some value ok 2 - mrm_7 gets same value ok 3 - mrm_8 gets same value ok 4 - mrm_6 gets same value

        and here's the code:

        #!/usr/bin/perl use 5.6.0; use strict; use warnings FATAL => 'all'; use Benchmark qw( cmpthese ); my $s1 = do_rand(0, 100_000); my $s2 = do_rand(1, 100_000); my $subs = { 'split1' => sub { my $s3 = split1( $s1, $s2 ) }, 'mrm_6' => sub { mrm_6( \$s1, \$s2 ); $s1 }, 'mrm_7' => sub { mrm_7( \$s1, \$s2 ); $s1 }, 'mrm_8' => sub { mrm_8( \$s1, \$s2 ); $s1 }, }; cmpthese( -5, $subs ); use Test::More; plan 'tests' => scalar keys %{$subs}; my $s3; foreach my $subname ( keys %{$subs} ) { my $sub = $subs->{$subname}; if ( defined $s3 ) { is( $sub->(), $s3, "$subname gets same value" ); } else { $s3 = $sub->(); ok( defined $s3, "$subname gets some value" ); } } sub split1 { my ($s1, $s2) = @_; my @s1 = split //, $s1; my @s2 = split //, $s2; foreach my $idx ( 0 .. $#s1 ) { if ( $s1[$idx] eq chr(0) ) { $s1[$idx] = $s2[$idx]; } } return join '', @s1; } sub mrm_6 { # from mrn_5, testing bytes::misc explicitly instead of importing # also in-place using of $s2 my ( $s1, $s2 ) = @_; use bytes (); my $pos = 0; while ( -1 < ( $pos = bytes::index( $$s1, '\0', $pos ) ) ) { bytes::substr( $$s1, $pos, 1, bytes::substr( $$s2, $pos, 1 ) ) +; } } sub mrm_7 { # from mrn_5, testing bytes::misc explicitly instead of importing # also in-place using of $s2 my ( $s1, $s2 ) = @_; use bytes (); my $pos = 0; while ( -1 < ( $pos = bytes::index( $$s1, "\000", $pos ) ) ) { bytes::substr( $$s1, $pos, 1, bytes::substr( $$s2, $pos, 1 ) ) +; } } sub mrm_8 { # from mrn_5, testing bytes::misc explicitly instead of importing # also in-place using of $s2 my ( $s1, $s2 ) = @_; use bytes (); my $pos = 0; my $chr = chr 0; while ( -1 < ( $pos = bytes::index( $$s1, $chr, $pos ) ) ) { bytes::substr( $$s1, $pos, 1, bytes::substr( $$s2, $pos, 1 ) ) +; } } # This makes sure that $s1 has chr(0)'s in it and $s2 does not. sub do_rand { my $min = shift; my $len = shift; my $n = ""; for (1 .. $len) { $n .= chr( rand(255-$min)+$min ); } return $n; } #sub do_rand { # my $n = (shift) ? int(rand(255)) : int(rand(254)) + 1; # return chr( $n ); #} __END__

        If you don't trust Test::More, I guess you could make smaller sample data strings and visually inspect them. Update:Maybe we should trust Test::More, but take a more carefullook at the tests for the benchmarking and testing code being used from above.

Re^6: Challenge: CPU-optimized byte-wise or-equals (for a meter of beer)
by mr_mischief (Monsignor) on Sep 13, 2007 at 21:08 UTC
    With correction "\x00" in place of the erroneous '\0' on Strawberry:
    Rate ikegami_s ikegami_s 36.4/s -- avar 185/s 408% avar2 192/s 426% avar2_pos 324/s 788% ikegami_tr 329/s 804% moritz 786/s 2056% mrm_5 1561/s 4183% mrm_3 1578/s 4229% avar2_pos_inplace 1610/s 4318% mrm_4 1620/s 4344% mrm_1 1640/s 4400% mrm_6 1899/s 5111% ok 1 - mrm_1 gets some value ok 2 - mrm_4 gets same value ok 3 - avar2_pos_inplace gets same value ok 4 - moritz gets same value ok 5 - mrm_3 gets same value ok 6 - mrm_5 gets same value ok 7 - avar2_pos gets same value ok 8 - ikegami_tr gets same value ok 9 - mrm_6 gets same value ok 10 - ikegami_s gets same value ok 11 - avar gets same value ok 12 - avar2 gets same value
    And this is with this code:
    #!/usr/bin/perl use 5.6.0; use strict; use warnings FATAL => 'all'; use Benchmark qw( cmpthese ); my $s1 = do_rand(0, 100_000); my $s2 = do_rand(1, 100_000); my $subs = { # 'split1' => sub { my $s3 = split1( $s1, $s2 ) }, # 'substr1' => sub { my $s3 = substr1( $s1, $s2 ) }, # 'kyle' => sub { my $s3 = kyle( $s1, $s2 ) }, 'moritz' => sub { my $s3 = moritz( $s1, $s2 ) }, # 'corion' => sub { my $s3 = corion( $s1, $s2 ) }, 'ikegami_s' => sub { my $s3 = ikegami_s( $s1, $s2 ) }, 'ikegami_tr' => sub { my $s3 = ikegami_tr( $s1, $s2 ) }, 'avar' => sub { my $s3 = avar( $s1, $s2 ) }, 'avar2' => sub { my $s3 = avar2( $s1, $s2 ) }, 'avar2_pos' => sub { my $s3 = avar2_pos( $s1, $s2 ) }, 'avar2_pos_inplace' => sub { avar2_pos_inplace( \$s1, $s2 ); $s1 +}, 'mrm_1' => sub { mrm_1( \$s1, $s2 ); $s1 }, # 'mrm_2' => sub { mrm_2( \$s1, $s2 ); $s1 }, 'mrm_3' => sub { mrm_3( \$s1, $s2 ); $s1 }, 'mrm_4' => sub { mrm_4( \$s1, $s2 ); $s1 }, 'mrm_5' => sub { mrm_5( \$s1, $s2 ); $s1 }, 'mrm_6' => sub { mrm_6( \$s1, \$s2 ); $s1 }, }; cmpthese( -2, $subs ); use Test::More; plan 'tests' => scalar keys %{$subs}; my $s3; foreach my $subname ( keys %{$subs} ) { my $sub = $subs->{$subname}; if ( defined $s3 ) { is( $sub->(), $s3, "$subname gets same value" ); } else { $s3 = $sub->(); ok( defined $s3, "$subname gets some value" ); } } sub split1 { my ($s1, $s2) = @_; my @s1 = split //, $s1; my @s2 = split //, $s2; foreach my $idx ( 0 .. $#s1 ) { if ( $s1[$idx] eq chr(0) ) { $s1[$idx] = $s2[$idx]; } } return join '', @s1; } sub substr1 { my ($s1, $s2) = @_; for my $idx ( 0 .. length($s1) ) { if ( substr($s1,$idx,1) eq chr(0) ) { substr($s1, $idx, 1) = substr($s2, $idx, 1); } } return $s1; } sub kyle { my ($s1, $s2) = @_; my $out = $s1; while ( $s1 =~ m/\000/g ) { my $pos = pos; substr( $out, $pos, 1 ) = substr( $s2, $pos, 1 ); } return $out; } sub moritz { my ($s1, $s2) = @_; my $pos = 0; while ( -1 < ( $pos = index $s1, "\000", $pos ) ) { substr( $s1, $pos, 1 ) = substr( $s2, $pos, 1 ); } return $s1; } sub ikegami_s { my ($s1, $s2) = @_; (my $mask = $s1) =~ s/[^\x00]/\xFF/g; return ($s1 & $mask) | ($s2 & ~$mask); } sub ikegami_tr { my ($s1, $s2) = @_; (my $mask = $s1) =~ tr/\x00/\xFF/c; return ($s1 & $mask) | ($s2 & ~$mask); } sub corion { my ($s1, $s2) = @_; my $ofs = 0; return join "", map { $ofs += length; $_ => substr $s2, $ofs++, 1 } + split /\0/, $s1, -1; } sub avar { my ($s1, $s2) = @_; my $s3 = $s1; { use bytes; $s3 =~ s/(\0)/substr $s2, $+[0]-1, 1/eg; } $s3; } sub avar2 { my ($s1, $s2) = @_; use bytes; $s1 =~ s/(\0)/substr $s2, $+[0]-1, 1/eg; return $s1; } sub avar2_pos { my ($s1, $s2) = @_; use bytes; $s1 =~ s/\0/substr $s2, pos($s1), 1/eg; return $s1; } sub avar2_pos_inplace { my ($s1, $s2) = @_; use bytes; $$s1 =~ s/\0/substr $s2, pos($$s1), 1/eg; } sub mrm_1 { my ( $s1, $s2 ) = @_; # from [moritz]'s work use bytes; my $pos = 0; while ( -1 < ( $pos = index $$s1, "\x00", $pos ) ) { substr( $$s1, $pos, 1 ) = substr( $s2, $pos, 1 ); } } sub mrm_3 { my ( $s1, $s2 ) = @_; # from moritz's, builds a separate loop of zeros use bytes; my @zeros = (); my $pos = 0; while ( -1 < ( $pos = index $$s1, "\x00", $pos ) ) { push @zeros, $pos; } for ( @zeros ) { substr( $$s1, $_, 1 ) = substr( $s2, $_, 1 ); } } sub mrm_4 { # from [bart]'s vec() my ($s1, $s2) = @_; use bytes; my $pos = 0; while ( -1 < ( $pos = index $$s1, "\x00", $pos ) ) { vec( $$s1, $pos, 8 ) ||= vec( $s2, $pos, 8 ); } } sub mrm_5 { # from moritz's, seeing if four-arg substr() is faster or slower t +han lvalue substr() my ( $s1, $s2 ) = @_; use bytes; my $pos = 0; while ( -1 < ( $pos = index $$s1, "\x00", $pos ) ) { substr( $$s1, $pos, 1, substr( $s2, $pos, 1 ) ); } } sub mrm_6 { # from mrn_5, testing bytes::misc explicitly instead of importing # also in-place using of $s2 my ( $s1, $s2 ) = @_; use bytes (); my $pos = 0; while ( -1 < ( $pos = bytes::index( $$s1, "\x00", $pos ) ) ) { bytes::substr( $$s1, $pos, 1, bytes::substr( $$s2, $pos, 1 ) ) +; } } # This makes sure that $s1 has chr(0)'s in it and $s2 does not. sub do_rand { my $min = shift; my $len = shift; my $n = ""; for (1 .. $len) { $n .= chr( rand(255-$min)+$min ); } return $n; } #sub do_rand { # my $n = (shift) ? int(rand(255)) : int(rand(254)) + 1; # return chr( $n ); #} __END__

      I fixed the in-place tests so they actually worked on the intended test data. And to make the benchmarks worth anything, I made inline those that weren't inline.

      By the way, mrm_3 looped forever if $s1 contained a NUL. I fixed it by adding a + 1 to index's last arg.

      1..16 ok 1 - corion gets some value ok 2 - substr1 gets same value ok 3 - ikegami4 gets same value ok 4 - mrm_1 gets same value ok 5 - mrm_4 gets same value ok 6 - bart gets same value ok 7 - mrm_3 gets same value ok 8 - split1 gets same value ok 9 - moritz gets same value ok 10 - mrm_5 gets same value ok 11 - avar2_pos gets same value ok 12 - ikegami1 gets same value ok 13 - ikegami2 gets same value ok 14 - ikegami3 gets same value ok 15 - mrm_6 gets same value ok 16 - avar2 gets same value done. Rate split1 substr1 ikegami1 corion avar2 mrm_6 avar2_pos + ikegami2 bart ikegami3 mrm_3 mrm_4 moritz ikegami4 mrm_1 mrm_5 split1 1.08/s -- -96% -97% -100% -100% -100% -100% + -100% -100% -100% -100% -100% -100% -100% -100% -100% substr1 28.0/s 2505% -- -22% -94% -94% -95% -96% + -97% -97% -97% -98% -98% -98% -99% -99% -99% ikegami1 36.1/s 3252% 29% -- -92% -92% -93% -95% + -96% -97% -97% -98% -98% -98% -98% -98% -99% corion 465/s 43105% 1558% 1189% -- -3% -12% -37% + -46% -56% -57% -70% -74% -75% -75% -78% -83% avar2 479/s 44448% 1610% 1229% 3% -- -9% -35% + -44% -55% -55% -69% -73% -74% -75% -78% -82% mrm_6 526/s 48782% 1776% 1358% 13% 10% -- -29% + -38% -50% -51% -66% -71% -71% -72% -76% -80% avar2_pos 742/s 68898% 2548% 1959% 60% 55% 41% -- + -13% -30% -31% -52% -59% -59% -61% -65% -72% ikegami2 855/s 79348% 2949% 2270% 84% 78% 63% 15% + -- -19% -20% -44% -53% -53% -55% -60% -68% bart 1054/s 97924% 3662% 2825% 127% 120% 101% 42% + 23% -- -1% -31% -42% -42% -44% -51% -61% ikegami3 1070/s 99384% 3718% 2868% 130% 123% 104% 44% + 25% 1% -- -30% -41% -41% -43% -50% -60% mrm_3 1532/s 142352% 5367% 4150% 230% 220% 191% 106% + 79% 45% 43% -- -15% -16% -19% -29% -43% mrm_4 1804/s 167619% 6337% 4904% 288% 276% 243% 143% + 111% 71% 69% 18% -- -1% -5% -16% -33% moritz 1822/s 169343% 6403% 4955% 292% 280% 247% 146% + 113% 73% 70% 19% 1% -- -4% -15% -32% ikegami4 1891/s 175685% 6647% 5144% 307% 295% 260% 155% + 121% 79% 77% 23% 5% 4% -- -12% -29% mrm_1 2150/s 199814% 7573% 5864% 363% 349% 309% 190% + 152% 104% 101% 40% 19% 18% 14% -- -20% mrm_5 2675/s 248608% 9446% 7320% 476% 458% 409% 260% + 213% 154% 150% 75% 48% 47% 41% 24% --
        Okay, now that tests and benchmarks are working (thanks, ikegami), I'll do my comparative runs and show the differences between the different perl builds on this.

        The fastest pure-Perl solution in every case seems to be mrm_5, and usually by a decent margin. That's a version of moritz's code slightly tweaked with some of the tips taken from avar's code. moritz's own code (as tweaked for in-place fairness by ikegami) seems to be near the top quite consistently. So if pure Perl speed is the goal, moritz seems to have been on the right track all along. As my code for mrm_5 is just variations on improving his, I think he should get most of the credit for it (especially since he didn't go stupid halfway through and break it like I did).

        Strawberry 5.8.8 on WinXP, AthlonXP 2400+, 1GB

        ActivePerl 5.8.0 on same machine Died during tests
        1..16 ok 1 - corion gets some value ok 2 - substr1 gets same value ok 3 - mrm_1 gets same value ok 4 - mrm_4 gets same value ok 5 - split1 gets same value ok 6 - moritz gets same value ok 7 - avar2_pos gets same value ok 8 - ikegami1 gets same value ok 9 - ikegami2 gets same value ok 10 - ikegami3 gets same value ok 11 - ikegami4 gets same value ok 12 - bart gets same value ok 13 - mrm_3 gets same value ok 14 - mrm_5 gets same value
        cygperl 5.8.7 on same machine perl 5.8.7 on Mandriva 2006, Athlon 1000, 512 MB perl 5.9.5 (gcc 4.0.1, -O4 (4.0.1-5mdk)) on the above Mandriva

        Just for kicks: miniperl 5.9.5 on Mandriva

      Depending on the order in which the tests are run, between half and all of the tests are worthless because the NULs are removed from $s1 by the first run of the first in-place test.