Re^4: Challenge: CPU-optimized byte-wise or-equals (for a meter of beer)

I think demerphq mentioned it already, but index() returns 0 if the 0-th character matches. It returns -1 if nothing in the string matches ( unless someone has played with $[ ).

I also had a couple of other thoughts. Surely the Inline::C solutions are faster, but I'm still trying to wring performance out of pure Perl. 1. Why not use $s2 through the reference as well? and 2. What about use bytes (); and then explicitly using bytes::substr and bytes::index instead of doing the import?

This code tests both of those together, for a whopping improvement over my solutions using the import and just using $s1 by reference.

sub mrm_6 {
    # from mrn_5, testing bytes::misc explicitly instead of importing
    # also in-place using of $s2
    my ( $s1, $s2 ) = @_;
    use bytes ();
    my $pos = 0;
    while ( -1 < ( $pos = bytes::index( $$s1, '\0', $pos ) ) ) {
        bytes::substr( $$s1, $pos, 1, bytes::substr( $$s2, $pos, 1 ) )
+;
    }
}
[download]

The results are impressive for such simple changes:

Strawberry Perl 5.8.8 on WinXP, AthlonXP 2400+, 1Gig

                    Rate ikegami_s
ikegami_s         37.0/s        --
avar               188/s      409%
avar2              192/s      418%
avar2_pos          282/s      663%
ikegami_tr         342/s      825%
moritz             783/s     2018%
avar2_pos_inplace 1640/s     4338%
mrm_3             2436/s     6491%
mrm_4             2449/s     6524%
mrm_5             2459/s     6553%
mrm_1             2517/s     6709%
mrm_6             3372/s     9023%
[download]

cygperl 5.8.6 on the same machine as above

                    Rate ikegami_s
ikegami_s         37.9/s        --
avar               285/s      650%
avar2              310/s      718%
ikegami_tr         316/s      733%
avar2_pos          709/s     1769%
moritz            1002/s     2543%
mrm_5             3631/s     9474%
mrm_3             3841/s    10027%
mrm_4             3913/s    10219%
mrm_1             4044/s    10562%
avar2_pos_inplace 4393/s    11484%
mrm_6             6237/s    16345%
[download]

perl 5.8.7 on Mandriva Linux 2006 Athlon 1000, 512MB RAM

                    Rate ikegami_s  
ikegami_s         17.2/s        --  
avar               168/s      876%
avar2              187/s      983%
ikegami_tr         205/s     1091%
avar2_pos          307/s     1684%
moritz             620/s     3499%
mrm_4             1088/s     6218%
avar2_pos_inplace 1184/s     6775%
mrm_3             1184/s     6775%
mrm_5             1224/s     7006%
mrm_1             1240/s     7101%
mrm_6             1921/s    11052%
[download]

... and perhaps a sign of good things to come: perl 5.9.5 on the above-mentioned Linux box

                    Rate ikegami_s
ikegami_s         13.3/s        --
avar               172/s     1189%
ikegami_tr         176/s     1220%
avar2              182/s     1267%
avar2_pos          258/s     1837%
moritz             412/s     2987%
avar2_pos_inplace  776/s     5722%
mrm_3              780/s     5749%
mrm_1              785/s     5788%
mrm_5              798/s     5882%
mrm_4              806/s     5942%
mrm_6             2683/s    20026%
[download]

ActivePerl 5.8.0 on the Windows box didn't fare so well under this one

I killed the benchmark for ActivePerl at 7 minutes CPU time, over 23 million page faults, and over 175MB of memory usage. I guess there's probably a bug in bytes or in perl in that build that's causing the thrashing.

Comment on Re^4: Challenge: CPU-optimized byte-wise or-equals (for a meter of beer) Select or Download Code

Replies are listed 'Best First'.
Re^5: Challenge: CPU-optimized byte-wise or-equals (for a meter of beer) by dragonchild (Archbishop) on Sep 13, 2007 at 16:28 UTC
'\0' is not the same as chr(0). By changing to chr(0), your code drops dramatically in the rankings. My criteria for good software: Does it work? Can someone else come in, make a change, and be reasonably certain no bugs were introduced?	[reply]
Re^6: Challenge: CPU-optimized byte-wise or-equals (for a meter of beer) by mr_mischief (Monsignor) on Sep 13, 2007 at 16:36 UTC
Update: Everything in this node is based on a silly mistake and is irrelevant. The readmore has my original lunatic ramblings. dragonchild's statement above is correct. chr(0) is a touch less efficient than "\0" or just plain 0 it seems, but not nearly so much as to make any difference in the real world. Still, the tests pass with the wrong string. The version of the benchmark and test code I grabbed from an earlier node must've reinforced my blunder. Read more... (2 kB)	[reply] [d/l] [select]
Re^7: Challenge: CPU-optimized byte-wise or-equals (for a meter of beer) by dragonchild (Archbishop) on Sep 13, 2007 at 16:50 UTC
chr(0) is just a byte with all 8 bits set to 0. It's considered the NUL byte in many languages because it's useful to do so. Tossing in bytes::chr() slowed your version down even more. :-( Update: Corrected per ikegami's response. My criteria for good software: Does it work? Can someone else come in, make a change, and be reasonably certain no bugs were introduced?	[reply]
Re^8: Challenge: CPU-optimized byte-wise or-equals (for a meter of beer) by ikegami (Patriarch) on Sep 13, 2007 at 17:19 UTC
Re^9: Challenge: CPU-optimized byte-wise or-equals (for a meter of beer) by mr_mischief (Monsignor) on Sep 13, 2007 at 17:52 UTC
Some notes below your chosen depth have not been shown here
Re^8: Challenge: CPU-optimized byte-wise or-equals (for a meter of beer) by mr_mischief (Monsignor) on Sep 13, 2007 at 18:23 UTC
Re^9: Challenge: CPU-optimized byte-wise or-equals (for a meter of beer) by ikegami (Patriarch) on Sep 13, 2007 at 19:32 UTC
Re^9: Challenge: CPU-optimized byte-wise or-equals (for a meter of beer) by ikegami (Patriarch) on Sep 13, 2007 at 19:14 UTC
Re^6: Challenge: CPU-optimized byte-wise or-equals (for a meter of beer) by mr_mischief (Monsignor) on Sep 13, 2007 at 21:08 UTC
With correction "\x00" in place of the erroneous '\0' on Strawberry: Rate ikegami_s ikegami_s 36.4/s -- avar 185/s 408% avar2 192/s 426% avar2_pos 324/s 788% ikegami_tr 329/s 804% moritz 786/s 2056% mrm_5 1561/s 4183% mrm_3 1578/s 4229% avar2_pos_inplace 1610/s 4318% mrm_4 1620/s 4344% mrm_1 1640/s 4400% mrm_6 1899/s 5111% ok 1 - mrm_1 gets some value ok 2 - mrm_4 gets same value ok 3 - avar2_pos_inplace gets same value ok 4 - moritz gets same value ok 5 - mrm_3 gets same value ok 6 - mrm_5 gets same value ok 7 - avar2_pos gets same value ok 8 - ikegami_tr gets same value ok 9 - mrm_6 gets same value ok 10 - ikegami_s gets same value ok 11 - avar gets same value ok 12 - avar2 gets same value [download] And this is with this code: #!/usr/bin/perl use 5.6.0; use strict; use warnings FATAL => 'all'; use Benchmark qw( cmpthese ); my $s1 = do_rand(0, 100_000); my $s2 = do_rand(1, 100_000); my $subs = { # 'split1' => sub { my $s3 = split1( $s1, $s2 ) }, # 'substr1' => sub { my $s3 = substr1( $s1, $s2 ) }, # 'kyle' => sub { my $s3 = kyle( $s1, $s2 ) }, 'moritz' => sub { my $s3 = moritz( $s1, $s2 ) }, # 'corion' => sub { my $s3 = corion( $s1, $s2 ) }, 'ikegami_s' => sub { my $s3 = ikegami_s( $s1, $s2 ) }, 'ikegami_tr' => sub { my $s3 = ikegami_tr( $s1, $s2 ) }, 'avar' => sub { my $s3 = avar( $s1, $s2 ) }, 'avar2' => sub { my $s3 = avar2( $s1, $s2 ) }, 'avar2_pos' => sub { my $s3 = avar2_pos( $s1, $s2 ) }, 'avar2_pos_inplace' => sub { avar2_pos_inplace( \$s1, $s2 ); $s1 +}, 'mrm_1' => sub { mrm_1( \$s1, $s2 ); $s1 }, # 'mrm_2' => sub { mrm_2( \$s1, $s2 ); $s1 }, 'mrm_3' => sub { mrm_3( \$s1, $s2 ); $s1 }, 'mrm_4' => sub { mrm_4( \$s1, $s2 ); $s1 }, 'mrm_5' => sub { mrm_5( \$s1, $s2 ); $s1 }, 'mrm_6' => sub { mrm_6( \$s1, \$s2 ); $s1 }, }; cmpthese( -2, $subs ); use Test::More; plan 'tests' => scalar keys %{$subs}; my $s3; foreach my $subname ( keys %{$subs} ) { my $sub = $subs->{$subname}; if ( defined $s3 ) { is( $sub->(), $s3, "$subname gets same value" ); } else { $s3 = $sub->(); ok( defined $s3, "$subname gets some value" ); } } sub split1 { my ($s1, $s2) = @_; my @s1 = split //, $s1; my @s2 = split //, $s2; foreach my $idx ( 0 .. $#s1 ) { if ( $s1[$idx] eq chr(0) ) { $s1[$idx] = $s2[$idx]; } } return join '', @s1; } sub substr1 { my ($s1, $s2) = @_; for my $idx ( 0 .. length($s1) ) { if ( substr($s1,$idx,1) eq chr(0) ) { substr($s1, $idx, 1) = substr($s2, $idx, 1); } } return $s1; } sub kyle { my ($s1, $s2) = @_; my $out = $s1; while ( $s1 =~ m/\000/g ) { my $pos = pos; substr( $out, $pos, 1 ) = substr( $s2, $pos, 1 ); } return $out; } sub moritz { my ($s1, $s2) = @_; my $pos = 0; while ( -1 < ( $pos = index $s1, "\000", $pos ) ) { substr( $s1, $pos, 1 ) = substr( $s2, $pos, 1 ); } return $s1; } sub ikegami_s { my ($s1, $s2) = @_; (my $mask = $s1) =~ s/[^\x00]/\xFF/g; return ($s1 & $mask) \| ($s2 & ~$mask); } sub ikegami_tr { my ($s1, $s2) = @_; (my $mask = $s1) =~ tr/\x00/\xFF/c; return ($s1 & $mask) \| ($s2 & ~$mask); } sub corion { my ($s1, $s2) = @_; my $ofs = 0; return join "", map { $ofs += length; $_ => substr $s2, $ofs++, 1 } + split /\0/, $s1, -1; } sub avar { my ($s1, $s2) = @_; my $s3 = $s1; { use bytes; $s3 =~ s/(\0)/substr $s2, $+[0]-1, 1/eg; } $s3; } sub avar2 { my ($s1, $s2) = @_; use bytes; $s1 =~ s/(\0)/substr $s2, $+[0]-1, 1/eg; return $s1; } sub avar2_pos { my ($s1, $s2) = @_; use bytes; $s1 =~ s/\0/substr $s2, pos($s1), 1/eg; return $s1; } sub avar2_pos_inplace { my ($s1, $s2) = @_; use bytes; $$s1 =~ s/\0/substr $s2, pos($$s1), 1/eg; } sub mrm_1 { my ( $s1, $s2 ) = @_; # from [moritz]'s work use bytes; my $pos = 0; while ( -1 < ( $pos = index $$s1, "\x00", $pos ) ) { substr( $$s1, $pos, 1 ) = substr( $s2, $pos, 1 ); } } sub mrm_3 { my ( $s1, $s2 ) = @_; # from moritz's, builds a separate loop of zeros use bytes; my @zeros = (); my $pos = 0; while ( -1 < ( $pos = index $$s1, "\x00", $pos ) ) { push @zeros, $pos; } for ( @zeros ) { substr( $$s1, $_, 1 ) = substr( $s2, $_, 1 ); } } sub mrm_4 { # from [bart]'s vec() my ($s1, $s2) = @_; use bytes; my $pos = 0; while ( -1 < ( $pos = index $$s1, "\x00", $pos ) ) { vec( $$s1, $pos, 8 ) \|\|= vec( $s2, $pos, 8 ); } } sub mrm_5 { # from moritz's, seeing if four-arg substr() is faster or slower t +han lvalue substr() my ( $s1, $s2 ) = @_; use bytes; my $pos = 0; while ( -1 < ( $pos = index $$s1, "\x00", $pos ) ) { substr( $$s1, $pos, 1, substr( $s2, $pos, 1 ) ); } } sub mrm_6 { # from mrn_5, testing bytes::misc explicitly instead of importing # also in-place using of $s2 my ( $s1, $s2 ) = @_; use bytes (); my $pos = 0; while ( -1 < ( $pos = bytes::index( $$s1, "\x00", $pos ) ) ) { bytes::substr( $$s1, $pos, 1, bytes::substr( $$s2, $pos, 1 ) ) +; } } # This makes sure that $s1 has chr(0)'s in it and $s2 does not. sub do_rand { my $min = shift; my $len = shift; my $n = ""; for (1 .. $len) { $n .= chr( rand(255-$min)+$min ); } return $n; } #sub do_rand { # my $n = (shift) ? int(rand(255)) : int(rand(254)) + 1; # return chr( $n ); #} __END__ [download]	[reply] [d/l] [select]
Re^7: Challenge: CPU-optimized byte-wise or-equals (for a meter of beer) by ikegami (Patriarch) on Sep 14, 2007 at 02:05 UTC
I fixed the in-place tests so they actually worked on the intended test data. And to make the benchmarks worth anything, I made inline those that weren't inline. By the way, mrm_3 looped forever if `$s1` contained a NUL. I fixed it by adding a `+ 1` to `index`'s last arg. Read more... (6 kB) 1..16 ok 1 - corion gets some value ok 2 - substr1 gets same value ok 3 - ikegami4 gets same value ok 4 - mrm_1 gets same value ok 5 - mrm_4 gets same value ok 6 - bart gets same value ok 7 - mrm_3 gets same value ok 8 - split1 gets same value ok 9 - moritz gets same value ok 10 - mrm_5 gets same value ok 11 - avar2_pos gets same value ok 12 - ikegami1 gets same value ok 13 - ikegami2 gets same value ok 14 - ikegami3 gets same value ok 15 - mrm_6 gets same value ok 16 - avar2 gets same value done. Rate split1 substr1 ikegami1 corion avar2 mrm_6 avar2_pos + ikegami2 bart ikegami3 mrm_3 mrm_4 moritz ikegami4 mrm_1 mrm_5 split1 1.08/s -- -96% -97% -100% -100% -100% -100% + -100% -100% -100% -100% -100% -100% -100% -100% -100% substr1 28.0/s 2505% -- -22% -94% -94% -95% -96% + -97% -97% -97% -98% -98% -98% -99% -99% -99% ikegami1 36.1/s 3252% 29% -- -92% -92% -93% -95% + -96% -97% -97% -98% -98% -98% -98% -98% -99% corion 465/s 43105% 1558% 1189% -- -3% -12% -37% + -46% -56% -57% -70% -74% -75% -75% -78% -83% avar2 479/s 44448% 1610% 1229% 3% -- -9% -35% + -44% -55% -55% -69% -73% -74% -75% -78% -82% mrm_6 526/s 48782% 1776% 1358% 13% 10% -- -29% + -38% -50% -51% -66% -71% -71% -72% -76% -80% avar2_pos 742/s 68898% 2548% 1959% 60% 55% 41% -- + -13% -30% -31% -52% -59% -59% -61% -65% -72% ikegami2 855/s 79348% 2949% 2270% 84% 78% 63% 15% + -- -19% -20% -44% -53% -53% -55% -60% -68% bart 1054/s 97924% 3662% 2825% 127% 120% 101% 42% + 23% -- -1% -31% -42% -42% -44% -51% -61% ikegami3 1070/s 99384% 3718% 2868% 130% 123% 104% 44% + 25% 1% -- -30% -41% -41% -43% -50% -60% mrm_3 1532/s 142352% 5367% 4150% 230% 220% 191% 106% + 79% 45% 43% -- -15% -16% -19% -29% -43% mrm_4 1804/s 167619% 6337% 4904% 288% 276% 243% 143% + 111% 71% 69% 18% -- -1% -5% -16% -33% moritz 1822/s 169343% 6403% 4955% 292% 280% 247% 146% + 113% 73% 70% 19% 1% -- -4% -15% -32% ikegami4 1891/s 175685% 6647% 5144% 307% 295% 260% 155% + 121% 79% 77% 23% 5% 4% -- -12% -29% mrm_1 2150/s 199814% 7573% 5864% 363% 349% 309% 190% + 152% 104% 101% 40% 19% 18% 14% -- -20% mrm_5 2675/s 248608% 9446% 7320% 476% 458% 409% 260% + 213% 154% 150% 75% 48% 47% 41% 24% -- [download]	[reply] [d/l] [select]
Re^8: Challenge: CPU-optimized byte-wise or-equals (for a meter of beer) by mr_mischief (Monsignor) on Sep 14, 2007 at 14:39 UTC
Re^7: Challenge: CPU-optimized byte-wise or-equals (for a meter of beer) by ikegami (Patriarch) on Sep 14, 2007 at 01:14 UTC
Depending on the order in which the tests are run, between half and all of the tests are worthless because the NULs are removed from `$s1` by the first run of the first in-place test.	[reply] [d/l]