in reply to Re^3: Challenge: CPU-optimized byte-wise or-equals (for a meter of beer)
in thread Challenge: CPU-optimized byte-wise or-equals (for a meter of beer)

I think demerphq mentioned it already, but index() returns 0 if the 0-th character matches. It returns -1 if nothing in the string matches ( unless someone has played with $[ ).

I also had a couple of other thoughts. Surely the Inline::C solutions are faster, but I'm still trying to wring performance out of pure Perl. 1. Why not use $s2 through the reference as well? and 2. What about use bytes (); and then explicitly using bytes::substr and bytes::index instead of doing the import?

This code tests both of those together, for a whopping improvement over my solutions using the import and just using $s1 by reference.

sub mrm_6 { # from mrn_5, testing bytes::misc explicitly instead of importing # also in-place using of $s2 my ( $s1, $s2 ) = @_; use bytes (); my $pos = 0; while ( -1 < ( $pos = bytes::index( $$s1, '\0', $pos ) ) ) { bytes::substr( $$s1, $pos, 1, bytes::substr( $$s2, $pos, 1 ) ) +; } }

The results are impressive for such simple changes:

Strawberry Perl 5.8.8 on WinXP, AthlonXP 2400+, 1Gig

Rate ikegami_s ikegami_s 37.0/s -- avar 188/s 409% avar2 192/s 418% avar2_pos 282/s 663% ikegami_tr 342/s 825% moritz 783/s 2018% avar2_pos_inplace 1640/s 4338% mrm_3 2436/s 6491% mrm_4 2449/s 6524% mrm_5 2459/s 6553% mrm_1 2517/s 6709% mrm_6 3372/s 9023%
cygperl 5.8.6 on the same machine as above
Rate ikegami_s ikegami_s 37.9/s -- avar 285/s 650% avar2 310/s 718% ikegami_tr 316/s 733% avar2_pos 709/s 1769% moritz 1002/s 2543% mrm_5 3631/s 9474% mrm_3 3841/s 10027% mrm_4 3913/s 10219% mrm_1 4044/s 10562% avar2_pos_inplace 4393/s 11484% mrm_6 6237/s 16345%
perl 5.8.7 on Mandriva Linux 2006 Athlon 1000, 512MB RAM
Rate ikegami_s ikegami_s 17.2/s -- avar 168/s 876% avar2 187/s 983% ikegami_tr 205/s 1091% avar2_pos 307/s 1684% moritz 620/s 3499% mrm_4 1088/s 6218% avar2_pos_inplace 1184/s 6775% mrm_3 1184/s 6775% mrm_5 1224/s 7006% mrm_1 1240/s 7101% mrm_6 1921/s 11052%
... and perhaps a sign of good things to come: perl 5.9.5 on the above-mentioned Linux box
Rate ikegami_s ikegami_s 13.3/s -- avar 172/s 1189% ikegami_tr 176/s 1220% avar2 182/s 1267% avar2_pos 258/s 1837% moritz 412/s 2987% avar2_pos_inplace 776/s 5722% mrm_3 780/s 5749% mrm_1 785/s 5788% mrm_5 798/s 5882% mrm_4 806/s 5942% mrm_6 2683/s 20026%
ActivePerl 5.8.0 on the Windows box didn't fare so well under this one

I killed the benchmark for ActivePerl at 7 minutes CPU time, over 23 million page faults, and over 175MB of memory usage. I guess there's probably a bug in bytes or in perl in that build that's causing the thrashing.

Replies are listed 'Best First'.
Re^5: Challenge: CPU-optimized byte-wise or-equals (for a meter of beer)
by dragonchild (Archbishop) on Sep 13, 2007 at 16:28 UTC
    '\0' is not the same as chr(0). By changing to chr(0), your code drops dramatically in the rankings.

    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
      Update: Everything in this node is based on a silly mistake and is irrelevant. The readmore has my original lunatic ramblings.

      dragonchild's statement above is correct. chr(0) is a touch less efficient than "\0" or just plain 0 it seems, but not nearly so much as to make any difference in the real world. Still, the tests pass with the wrong string. The version of the benchmark and test code I grabbed from an earlier node must've reinforced my blunder.

        chr(0) is just a byte with all 8 bits set to 0. It's considered the NUL byte in many languages because it's useful to do so.

        Tossing in bytes::chr() slowed your version down even more. :-(

        Update: Corrected per ikegami's response.


        My criteria for good software:
        1. Does it work?
        2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
      With correction "\x00" in place of the erroneous '\0' on Strawberry:
      Rate ikegami_s ikegami_s 36.4/s -- avar 185/s 408% avar2 192/s 426% avar2_pos 324/s 788% ikegami_tr 329/s 804% moritz 786/s 2056% mrm_5 1561/s 4183% mrm_3 1578/s 4229% avar2_pos_inplace 1610/s 4318% mrm_4 1620/s 4344% mrm_1 1640/s 4400% mrm_6 1899/s 5111% ok 1 - mrm_1 gets some value ok 2 - mrm_4 gets same value ok 3 - avar2_pos_inplace gets same value ok 4 - moritz gets same value ok 5 - mrm_3 gets same value ok 6 - mrm_5 gets same value ok 7 - avar2_pos gets same value ok 8 - ikegami_tr gets same value ok 9 - mrm_6 gets same value ok 10 - ikegami_s gets same value ok 11 - avar gets same value ok 12 - avar2 gets same value
      And this is with this code:
      #!/usr/bin/perl use 5.6.0; use strict; use warnings FATAL => 'all'; use Benchmark qw( cmpthese ); my $s1 = do_rand(0, 100_000); my $s2 = do_rand(1, 100_000); my $subs = { # 'split1' => sub { my $s3 = split1( $s1, $s2 ) }, # 'substr1' => sub { my $s3 = substr1( $s1, $s2 ) }, # 'kyle' => sub { my $s3 = kyle( $s1, $s2 ) }, 'moritz' => sub { my $s3 = moritz( $s1, $s2 ) }, # 'corion' => sub { my $s3 = corion( $s1, $s2 ) }, 'ikegami_s' => sub { my $s3 = ikegami_s( $s1, $s2 ) }, 'ikegami_tr' => sub { my $s3 = ikegami_tr( $s1, $s2 ) }, 'avar' => sub { my $s3 = avar( $s1, $s2 ) }, 'avar2' => sub { my $s3 = avar2( $s1, $s2 ) }, 'avar2_pos' => sub { my $s3 = avar2_pos( $s1, $s2 ) }, 'avar2_pos_inplace' => sub { avar2_pos_inplace( \$s1, $s2 ); $s1 +}, 'mrm_1' => sub { mrm_1( \$s1, $s2 ); $s1 }, # 'mrm_2' => sub { mrm_2( \$s1, $s2 ); $s1 }, 'mrm_3' => sub { mrm_3( \$s1, $s2 ); $s1 }, 'mrm_4' => sub { mrm_4( \$s1, $s2 ); $s1 }, 'mrm_5' => sub { mrm_5( \$s1, $s2 ); $s1 }, 'mrm_6' => sub { mrm_6( \$s1, \$s2 ); $s1 }, }; cmpthese( -2, $subs ); use Test::More; plan 'tests' => scalar keys %{$subs}; my $s3; foreach my $subname ( keys %{$subs} ) { my $sub = $subs->{$subname}; if ( defined $s3 ) { is( $sub->(), $s3, "$subname gets same value" ); } else { $s3 = $sub->(); ok( defined $s3, "$subname gets some value" ); } } sub split1 { my ($s1, $s2) = @_; my @s1 = split //, $s1; my @s2 = split //, $s2; foreach my $idx ( 0 .. $#s1 ) { if ( $s1[$idx] eq chr(0) ) { $s1[$idx] = $s2[$idx]; } } return join '', @s1; } sub substr1 { my ($s1, $s2) = @_; for my $idx ( 0 .. length($s1) ) { if ( substr($s1,$idx,1) eq chr(0) ) { substr($s1, $idx, 1) = substr($s2, $idx, 1); } } return $s1; } sub kyle { my ($s1, $s2) = @_; my $out = $s1; while ( $s1 =~ m/\000/g ) { my $pos = pos; substr( $out, $pos, 1 ) = substr( $s2, $pos, 1 ); } return $out; } sub moritz { my ($s1, $s2) = @_; my $pos = 0; while ( -1 < ( $pos = index $s1, "\000", $pos ) ) { substr( $s1, $pos, 1 ) = substr( $s2, $pos, 1 ); } return $s1; } sub ikegami_s { my ($s1, $s2) = @_; (my $mask = $s1) =~ s/[^\x00]/\xFF/g; return ($s1 & $mask) | ($s2 & ~$mask); } sub ikegami_tr { my ($s1, $s2) = @_; (my $mask = $s1) =~ tr/\x00/\xFF/c; return ($s1 & $mask) | ($s2 & ~$mask); } sub corion { my ($s1, $s2) = @_; my $ofs = 0; return join "", map { $ofs += length; $_ => substr $s2, $ofs++, 1 } + split /\0/, $s1, -1; } sub avar { my ($s1, $s2) = @_; my $s3 = $s1; { use bytes; $s3 =~ s/(\0)/substr $s2, $+[0]-1, 1/eg; } $s3; } sub avar2 { my ($s1, $s2) = @_; use bytes; $s1 =~ s/(\0)/substr $s2, $+[0]-1, 1/eg; return $s1; } sub avar2_pos { my ($s1, $s2) = @_; use bytes; $s1 =~ s/\0/substr $s2, pos($s1), 1/eg; return $s1; } sub avar2_pos_inplace { my ($s1, $s2) = @_; use bytes; $$s1 =~ s/\0/substr $s2, pos($$s1), 1/eg; } sub mrm_1 { my ( $s1, $s2 ) = @_; # from [moritz]'s work use bytes; my $pos = 0; while ( -1 < ( $pos = index $$s1, "\x00", $pos ) ) { substr( $$s1, $pos, 1 ) = substr( $s2, $pos, 1 ); } } sub mrm_3 { my ( $s1, $s2 ) = @_; # from moritz's, builds a separate loop of zeros use bytes; my @zeros = (); my $pos = 0; while ( -1 < ( $pos = index $$s1, "\x00", $pos ) ) { push @zeros, $pos; } for ( @zeros ) { substr( $$s1, $_, 1 ) = substr( $s2, $_, 1 ); } } sub mrm_4 { # from [bart]'s vec() my ($s1, $s2) = @_; use bytes; my $pos = 0; while ( -1 < ( $pos = index $$s1, "\x00", $pos ) ) { vec( $$s1, $pos, 8 ) ||= vec( $s2, $pos, 8 ); } } sub mrm_5 { # from moritz's, seeing if four-arg substr() is faster or slower t +han lvalue substr() my ( $s1, $s2 ) = @_; use bytes; my $pos = 0; while ( -1 < ( $pos = index $$s1, "\x00", $pos ) ) { substr( $$s1, $pos, 1, substr( $s2, $pos, 1 ) ); } } sub mrm_6 { # from mrn_5, testing bytes::misc explicitly instead of importing # also in-place using of $s2 my ( $s1, $s2 ) = @_; use bytes (); my $pos = 0; while ( -1 < ( $pos = bytes::index( $$s1, "\x00", $pos ) ) ) { bytes::substr( $$s1, $pos, 1, bytes::substr( $$s2, $pos, 1 ) ) +; } } # This makes sure that $s1 has chr(0)'s in it and $s2 does not. sub do_rand { my $min = shift; my $len = shift; my $n = ""; for (1 .. $len) { $n .= chr( rand(255-$min)+$min ); } return $n; } #sub do_rand { # my $n = (shift) ? int(rand(255)) : int(rand(254)) + 1; # return chr( $n ); #} __END__

        I fixed the in-place tests so they actually worked on the intended test data. And to make the benchmarks worth anything, I made inline those that weren't inline.

        By the way, mrm_3 looped forever if $s1 contained a NUL. I fixed it by adding a + 1 to index's last arg.

        1..16 ok 1 - corion gets some value ok 2 - substr1 gets same value ok 3 - ikegami4 gets same value ok 4 - mrm_1 gets same value ok 5 - mrm_4 gets same value ok 6 - bart gets same value ok 7 - mrm_3 gets same value ok 8 - split1 gets same value ok 9 - moritz gets same value ok 10 - mrm_5 gets same value ok 11 - avar2_pos gets same value ok 12 - ikegami1 gets same value ok 13 - ikegami2 gets same value ok 14 - ikegami3 gets same value ok 15 - mrm_6 gets same value ok 16 - avar2 gets same value done. Rate split1 substr1 ikegami1 corion avar2 mrm_6 avar2_pos + ikegami2 bart ikegami3 mrm_3 mrm_4 moritz ikegami4 mrm_1 mrm_5 split1 1.08/s -- -96% -97% -100% -100% -100% -100% + -100% -100% -100% -100% -100% -100% -100% -100% -100% substr1 28.0/s 2505% -- -22% -94% -94% -95% -96% + -97% -97% -97% -98% -98% -98% -99% -99% -99% ikegami1 36.1/s 3252% 29% -- -92% -92% -93% -95% + -96% -97% -97% -98% -98% -98% -98% -98% -99% corion 465/s 43105% 1558% 1189% -- -3% -12% -37% + -46% -56% -57% -70% -74% -75% -75% -78% -83% avar2 479/s 44448% 1610% 1229% 3% -- -9% -35% + -44% -55% -55% -69% -73% -74% -75% -78% -82% mrm_6 526/s 48782% 1776% 1358% 13% 10% -- -29% + -38% -50% -51% -66% -71% -71% -72% -76% -80% avar2_pos 742/s 68898% 2548% 1959% 60% 55% 41% -- + -13% -30% -31% -52% -59% -59% -61% -65% -72% ikegami2 855/s 79348% 2949% 2270% 84% 78% 63% 15% + -- -19% -20% -44% -53% -53% -55% -60% -68% bart 1054/s 97924% 3662% 2825% 127% 120% 101% 42% + 23% -- -1% -31% -42% -42% -44% -51% -61% ikegami3 1070/s 99384% 3718% 2868% 130% 123% 104% 44% + 25% 1% -- -30% -41% -41% -43% -50% -60% mrm_3 1532/s 142352% 5367% 4150% 230% 220% 191% 106% + 79% 45% 43% -- -15% -16% -19% -29% -43% mrm_4 1804/s 167619% 6337% 4904% 288% 276% 243% 143% + 111% 71% 69% 18% -- -1% -5% -16% -33% moritz 1822/s 169343% 6403% 4955% 292% 280% 247% 146% + 113% 73% 70% 19% 1% -- -4% -15% -32% ikegami4 1891/s 175685% 6647% 5144% 307% 295% 260% 155% + 121% 79% 77% 23% 5% 4% -- -12% -29% mrm_1 2150/s 199814% 7573% 5864% 363% 349% 309% 190% + 152% 104% 101% 40% 19% 18% 14% -- -20% mrm_5 2675/s 248608% 9446% 7320% 476% 458% 409% 260% + 213% 154% 150% 75% 48% 47% 41% 24% --
        Depending on the order in which the tests are run, between half and all of the tests are worthless because the NULs are removed from $s1 by the first run of the first in-place test.