Re^6: Challenge: CPU-optimized byte-wise or-equals (for a meter of beer)

Update: Everything in this node is based on a silly mistake and is irrelevant. The readmore has my original lunatic ramblings.

dragonchild's statement above is correct. chr(0) is a touch less efficient than "\0" or just plain 0 it seems, but not nearly so much as to make any difference in the real world. Still, the tests pass with the wrong string. The version of the benchmark and test code I grabbed from an earlier node must've reinforced my blunder.

Isn't chr(0) a null byte? Are you not on an ASCII machine? It should be the same, and should be saving a call.

If you need eq to work as well as ==, then it's a simple matter of "\0" instead of '\0'.

Also, if you're going to insist on using chr() here, use bytes::chr() instead, as that's the point use use bytes;.

Update:

[chris@localhost perl]$ perl -e '$foo = q{\0}; $bar = chr(0); print "m
+atch!\n" if $foo eq $bar;'
[download]

[chris@localhost perl]$ perl -e '$foo = q{\0}; $bar = chr(0); print "m
+atch!\n" if $foo == $bar;'
match!
[download]

[chris@localhost perl]$ perl -e '$foo = qq{\0}; $bar = chr(0); print "
+match!\n" if $foo eq $bar;'
match!
[download]

Comment on Re^6: Challenge: CPU-optimized byte-wise or-equals (for a meter of beer) Select or Download Code

Replies are listed 'Best First'.
Re^7: Challenge: CPU-optimized byte-wise or-equals (for a meter of beer) by dragonchild (Archbishop) on Sep 13, 2007 at 16:50 UTC
chr(0) is just a byte with all 8 bits set to 0. It's considered the NUL byte in many languages because it's useful to do so. Tossing in bytes::chr() slowed your version down even more. :-( Update: Corrected per ikegami's response. My criteria for good software: Does it work? Can someone else come in, make a change, and be reasonably certain no bugs were introduced?	[reply]
Re^8: Challenge: CPU-optimized byte-wise or-equals (for a meter of beer) by ikegami (Patriarch) on Sep 13, 2007 at 17:19 UTC
`NULL` is a special pointer in C, and an exceptional value in in SQL and VB. `NUL` is ASCII character 0. I believe you are refereing to `NUL`.	[reply] [d/l] [select]
Re^9: Challenge: CPU-optimized byte-wise or-equals (for a meter of beer) by mr_mischief (Monsignor) on Sep 13, 2007 at 17:52 UTC
More to the point, since we've been working with byte values instead of characters for some time now (the very purpose of use bytes), isn't chr(0) indeed the same as '\0'? NUL is a null (notice the lowercase -- just meaning 0-valued and not anything special in jargon terms -- this is Webster's definition 4b or definition 6) byte. chr(0) is a zero-valued byte. NUL is chr(0). Right?UpdateI was thinking for some reason the last couple of days that single-character escapes worked in single quotes, and that only variable interpolation didn't. I don't know why, because I know better than that. I'll blame extreme tiredness, as that's likely the cause. I'm not an optimization guru or anything, but isn't working with the data at a lower level a common optimization technique? IIRC, it's a big part of why the core C language uses chars that are basically small integers, and that strings are represented by arrays of chars. It's easy for the compiler, and it's very efficient. It's certainly not because it makes programming string-heavy projects easier. I would really like to see the benchmark run that is showing my code failing the "same output" test, if such data exists. Update: This concerns me about the test, then, because my broken version doesn't seem to have ever killed the test.	[reply]
Re^10: Challenge: CPU-optimized byte-wise or-equals (for a meter of beer) by ikegami (Patriarch) on Sep 13, 2007 at 18:36 UTC
Re^11: Challenge: CPU-optimized byte-wise or-equals (for a meter of beer) by mr_mischief (Monsignor) on Sep 13, 2007 at 18:41 UTC
Re^8: Challenge: CPU-optimized byte-wise or-equals (for a meter of beer) by mr_mischief (Monsignor) on Sep 13, 2007 at 18:23 UTC
In response to your updated node: Look at an ASCII chart (2, 3, 4). What's at position 0? It's a byte with all zeros, and yes it's called NUL. It's also referred to as a 'null byte' (all lowercase, four letters). While it's true a null/zero byte can represent something other than NUL, NUL is always represented as a null/zero byte (at least in ASCII and EBCDIC). When dealing with eight-bit bytes, it shouldn't matter if you have an ASCII character, ~~'\0'~~, 0, "\x00", or "\0". vec(), ikegami's `tr/\x00/\xFF/`, and anything using `use bytes;` is working at the byte (or bit, in the case of vec()) level, and not necessarily working on "character" data. Update: it should matter if you have '\0'. It shouldn't matter about the rest. The test isn't failing, though. I don't need your beer. I'm just trying to help. You can do whatever you like, but I'm not sure where you're getting the idea that '\0' is producing a different end product in these cases. Try "\000" or "\x00" instead, and see if it changes anything at all. I'm guessing using `chr(0)` is changing absolutely nothing but the speed. Update: and I based this on the results of the Test::More tests that said it was all producing the same output. Apparently, either it's working by some fluke, or the tests are broken. Take a look at this: `Rate split1 mrm_7 mrm_8 mrm_6 split1 1.08/s -- -100% -100% -100% mrm_7 1906/s 176180% -- -0% -44% mrm_8 1910/s 176481% 0% -- -44% mrm_6 3381/s 312486% 77% 77% -- 1..4 ok 1 - split1 gets some value ok 2 - mrm_7 gets same value ok 3 - mrm_8 gets same value ok 4 - mrm_6 gets same value` [download] and here's the code: #!/usr/bin/perl use 5.6.0; use strict; use warnings FATAL => 'all'; use Benchmark qw( cmpthese ); my $s1 = do_rand(0, 100_000); my $s2 = do_rand(1, 100_000); my $subs = { 'split1' => sub { my $s3 = split1( $s1, $s2 ) }, 'mrm_6' => sub { mrm_6( \$s1, \$s2 ); $s1 }, 'mrm_7' => sub { mrm_7( \$s1, \$s2 ); $s1 }, 'mrm_8' => sub { mrm_8( \$s1, \$s2 ); $s1 }, }; cmpthese( -5, $subs ); use Test::More; plan 'tests' => scalar keys %{$subs}; my $s3; foreach my $subname ( keys %{$subs} ) { my $sub = $subs->{$subname}; if ( defined $s3 ) { is( $sub->(), $s3, "$subname gets same value" ); } else { $s3 = $sub->(); ok( defined $s3, "$subname gets some value" ); } } sub split1 { my ($s1, $s2) = @_; my @s1 = split //, $s1; my @s2 = split //, $s2; foreach my $idx ( 0 .. $#s1 ) { if ( $s1[$idx] eq chr(0) ) { $s1[$idx] = $s2[$idx]; } } return join '', @s1; } sub mrm_6 { # from mrn_5, testing bytes::misc explicitly instead of importing # also in-place using of $s2 my ( $s1, $s2 ) = @_; use bytes (); my $pos = 0; while ( -1 < ( $pos = bytes::index( $$s1, '\0', $pos ) ) ) { bytes::substr( $$s1, $pos, 1, bytes::substr( $$s2, $pos, 1 ) ) +; } } sub mrm_7 { # from mrn_5, testing bytes::misc explicitly instead of importing # also in-place using of $s2 my ( $s1, $s2 ) = @_; use bytes (); my $pos = 0; while ( -1 < ( $pos = bytes::index( $$s1, "\000", $pos ) ) ) { bytes::substr( $$s1, $pos, 1, bytes::substr( $$s2, $pos, 1 ) ) +; } } sub mrm_8 { # from mrn_5, testing bytes::misc explicitly instead of importing # also in-place using of $s2 my ( $s1, $s2 ) = @_; use bytes (); my $pos = 0; my $chr = chr 0; while ( -1 < ( $pos = bytes::index( $$s1, $chr, $pos ) ) ) { bytes::substr( $$s1, $pos, 1, bytes::substr( $$s2, $pos, 1 ) ) +; } } # This makes sure that $s1 has chr(0)'s in it and $s2 does not. sub do_rand { my $min = shift; my $len = shift; my $n = ""; for (1 .. $len) { $n .= chr( rand(255-$min)+$min ); } return $n; } #sub do_rand { # my $n = (shift) ? int(rand(255)) : int(rand(254)) + 1; # return chr( $n ); #} __END__ [download] If you don't trust Test::More, I guess you could make smaller sample data strings and visually inspect them. Update:Maybe we should trust Test::More, but take a more carefullook at the tests for the benchmarking and testing code being used from above.	[reply] [d/l] [select]
Re^9: Challenge: CPU-optimized byte-wise or-equals (for a meter of beer) by ikegami (Patriarch) on Sep 13, 2007 at 19:32 UTC
You clobber `$s1`, changing the inputs used for benchmarking and for testing. The `$s1` in your test script contains no NULs!	[reply] [d/l] [select]
Re^9: Challenge: CPU-optimized byte-wise or-equals (for a meter of beer) by ikegami (Patriarch) on Sep 13, 2007 at 19:14 UTC
I based this on the results of the Test::More tests that said it was all producing the same output. Apparently, either it's working by some fluke, or the tests are broken. Something is definitely wrong with the testing. The error shows up when and only when I comment out `cmpthese`. Even if I change `do_rand` to the following to make sure there is always NUL in the returned string: `sub do_rand { my $min = shift; my $len = shift; { my $n = ""; for (1 .. $len) { $n .= chr( rand(255-$min)+$min ); } if ($min == 0 && $n !~ /\x00/) { print("REDO!\n"); redo; } return $n; } }` [download] Updated.	[reply] [d/l] [select]