This is a very neat trick. I turned it into code for my application, and then benchmarked it against my original suggestion, as well as following the suggestion of bit vectors via Bit::Vector::Overload.

This unpack trick is significantly faster than my previous transliteration based counting. I couldn't figure out the correct bitwise construct to count my '00', but I can derive that from the total expected. The Bit::Vector approach is particularly slow, as I didn't see a way to preload my binary version of the data into the vectors. I'm not sure where the overhead is coming from, but I also doubt that I should get anything better than the unpack approach.

         Rate bitvec string unpack
bitvec  102/s     --   -97%   -99%
string 3276/s  3126%     --   -63%
unpack 8801/s  8565%   169%     --
Code used in testing:
use Bit::Vector::Overload; use Benchmark qw(cmpthese); use String::Random; #Generate random strings of 01 my $foo = new String::Random; my $length = 600; my @strings; my @bstrings; for ( my $i = 0 ; $i < 10 ; $i++ ) { my $string = $foo->randregex("[01]{$length}"); push @strings, $string; push @bstrings, pack qq{b$length}, $string; } cmpthese( -3, { 'string' => sub { for ( my $i = 0 ; $i < @strings ; $i++ ) { for ( my $j = $i + 1 ; $j < @strings ; $j++ ) { my $string1 = $strings[$i]; my $string2 = $strings[$j]; my ( $c01, $c10, $c11 ) = ( # ( $string1 | $string2 ) =~ tr[0][0], # count 00: COUNT by + math below ( ~$string1 & $string2 ) =~ tr[\1][\1], # c +ount 01 ( $string1 & ~$string2 ) =~ tr[\1][\1], # c +ount 10 ( $string1 & $string2 ) =~ tr[1][1], # c +ount 11 ); my $c00 = $length - $c01 - $c10 - $c11; } } }, 'bitvec' => sub { for ( my $i = 0 ; $i < @strings ; $i++ ) { for ( my $j = $i + 1 ; $j < @strings ; $j++ ) { my $string1 = $strings[$i]; my $string2 = $strings[$j]; my $vec1 = Bit::Vector->new_Bin( $length, $string1 + ); my $vec2 = Bit::Vector->new_Bin( $length, $string2 + ); my ( $v01, $v10, $v11 ) = ( #abs( ~$vec1 & ~$vec2 ), # count 00: COUNT by +math below abs( ~$vec1 & $vec2 ), # count 01 abs( $vec1 & ~$vec2 ), # count 10 abs( $vec1 & $vec2 ), # count 11 ); my $v0 = $length - $v01 - $v10 - $v11; } } }, 'unpack' => sub { for ( my $i = 0 ; $i < @bstrings ; $i++ ) { for ( my $j = $i + 1 ; $j < @bstrings ; $j++ ) { my $bstring1 = $bstrings[$i]; my $bstring2 = $bstrings[$j]; my $p01 = unpack q/%32b*/, ~$bstring1 & $bstr +ing2; my $p10 = unpack q/%32b*/, $bstring1 & ~$bstr +ing2; my $p11 = unpack q/%32b*/, $bstring1 & $bstri +ng2; my $p00 = $length - $p01 - $p10 - $p11; } } } } );

In reply to Re^2: Bitwise operators on binary objects by albert
in thread Bitwise operators on binary objects by albert

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.