in reply to Bitwise operators on binary objects

I was wondering if there is a way to do the 0/1 counts directly on the packed version

Yes, unpack can do that:

$ perl -le' my $string = q/01010110111000/; print $string =~ tr/1//; my $packed_string = pack q/b*/, $string; print unpack q/%32b*/, $packed_string; ' 7 7

Replies are listed 'Best First'.
Re^2: Bitwise operators on binary objects
by bart (Canon) on Jan 31, 2011 at 12:11 UTC
    Very nice.

    BTW in contrast to the normal unpack templates, which are (as any experienced perler knows) documented under pack, this feature is documented only under unpack:

    In addition to fields allowed in pack(), you may prefix a field with a %<number> to indicate that you want a <number>-bit checksum of the items instead of the items themselves.
Re^2: Bitwise operators on binary objects
by albert (Monk) on Jan 31, 2011 at 13:51 UTC
    This is a very neat trick. I turned it into code for my application, and then benchmarked it against my original suggestion, as well as following the suggestion of bit vectors via Bit::Vector::Overload.

    This unpack trick is significantly faster than my previous transliteration based counting. I couldn't figure out the correct bitwise construct to count my '00', but I can derive that from the total expected. The Bit::Vector approach is particularly slow, as I didn't see a way to preload my binary version of the data into the vectors. I'm not sure where the overhead is coming from, but I also doubt that I should get anything better than the unpack approach.

             Rate bitvec string unpack
    bitvec  102/s     --   -97%   -99%
    string 3276/s  3126%     --   -63%
    unpack 8801/s  8565%   169%     --
    
    Code used in testing:
    use Bit::Vector::Overload; use Benchmark qw(cmpthese); use String::Random; #Generate random strings of 01 my $foo = new String::Random; my $length = 600; my @strings; my @bstrings; for ( my $i = 0 ; $i < 10 ; $i++ ) { my $string = $foo->randregex("[01]{$length}"); push @strings, $string; push @bstrings, pack qq{b$length}, $string; } cmpthese( -3, { 'string' => sub { for ( my $i = 0 ; $i < @strings ; $i++ ) { for ( my $j = $i + 1 ; $j < @strings ; $j++ ) { my $string1 = $strings[$i]; my $string2 = $strings[$j]; my ( $c01, $c10, $c11 ) = ( # ( $string1 | $string2 ) =~ tr[0][0], # count 00: COUNT by + math below ( ~$string1 & $string2 ) =~ tr[\1][\1], # c +ount 01 ( $string1 & ~$string2 ) =~ tr[\1][\1], # c +ount 10 ( $string1 & $string2 ) =~ tr[1][1], # c +ount 11 ); my $c00 = $length - $c01 - $c10 - $c11; } } }, 'bitvec' => sub { for ( my $i = 0 ; $i < @strings ; $i++ ) { for ( my $j = $i + 1 ; $j < @strings ; $j++ ) { my $string1 = $strings[$i]; my $string2 = $strings[$j]; my $vec1 = Bit::Vector->new_Bin( $length, $string1 + ); my $vec2 = Bit::Vector->new_Bin( $length, $string2 + ); my ( $v01, $v10, $v11 ) = ( #abs( ~$vec1 & ~$vec2 ), # count 00: COUNT by +math below abs( ~$vec1 & $vec2 ), # count 01 abs( $vec1 & ~$vec2 ), # count 10 abs( $vec1 & $vec2 ), # count 11 ); my $v0 = $length - $v01 - $v10 - $v11; } } }, 'unpack' => sub { for ( my $i = 0 ; $i < @bstrings ; $i++ ) { for ( my $j = $i + 1 ; $j < @bstrings ; $j++ ) { my $bstring1 = $bstrings[$i]; my $bstring2 = $bstrings[$j]; my $p01 = unpack q/%32b*/, ~$bstring1 & $bstr +ing2; my $p10 = unpack q/%32b*/, $bstring1 & ~$bstr +ing2; my $p11 = unpack q/%32b*/, $bstring1 & $bstri +ng2; my $p00 = $length - $p01 - $p10 - $p11; } } } } );