comment on

This is a very neat trick. I turned it into code for my application, and then benchmarked it against my original suggestion, as well as following the suggestion of bit vectors via Bit::Vector::Overload.

This unpack trick is significantly faster than my previous transliteration based counting. I couldn't figure out the correct bitwise construct to count my '00', but I can derive that from the total expected. The Bit::Vector approach is particularly slow, as I didn't see a way to preload my binary version of the data into the vectors. I'm not sure where the overhead is coming from, but I also doubt that I should get anything better than the unpack approach.

         Rate bitvec string unpack
bitvec  102/s     --   -97%   -99%
string 3276/s  3126%     --   -63%
unpack 8801/s  8565%   169%     --

Code used in testing:

use Bit::Vector::Overload;
use Benchmark qw(cmpthese);
use String::Random;

#Generate random strings of 01
my $foo = new String::Random;

my $length = 600;
my @strings;
my @bstrings;

for ( my $i = 0 ; $i < 10 ; $i++ ) {
    my $string = $foo->randregex("[01]{$length}");
    push @strings, $string;
    push @bstrings, pack qq{b$length}, $string;
}

cmpthese(
    -3,
    {
        'string' => sub {
            for ( my $i = 0 ; $i < @strings ; $i++ ) {
                for ( my $j = $i + 1 ; $j < @strings ; $j++ ) {
                    my $string1 = $strings[$i];
                    my $string2 = $strings[$j];
                    my ( $c01, $c10, $c11 ) = (
     # ( $string1 | $string2 )  =~ tr[0][0],      # count 00: COUNT by
+ math below
                        ( ~$string1 & $string2 ) =~ tr[\1][\1],    # c
+ount 01
                        ( $string1 & ~$string2 ) =~ tr[\1][\1],    # c
+ount 10
                        ( $string1 & $string2 )  =~ tr[1][1],      # c
+ount 11

                    );
                    my $c00 = $length - $c01 - $c10 - $c11;
                }
            }
        },
        'bitvec' => sub {
            for ( my $i = 0 ; $i < @strings ; $i++ ) {
                for ( my $j = $i + 1 ; $j < @strings ; $j++ ) {
                    my $string1 = $strings[$i];
                    my $string2 = $strings[$j];

                    my $vec1 = Bit::Vector->new_Bin( $length, $string1
+ );
                    my $vec2 = Bit::Vector->new_Bin( $length, $string2
+ );

                    my ( $v01, $v10, $v11 ) = (

                     #abs( ~$vec1 & ~$vec2 ),    # count 00: COUNT by 
+math below
                        abs( ~$vec1 & $vec2 ),    # count 01
                        abs( $vec1 & ~$vec2 ),    # count 10
                        abs( $vec1 & $vec2 ),     # count 11

                    );
                    my $v0 = $length - $v01 - $v10 - $v11;
                }
            }

        },
        'unpack' => sub {
            for ( my $i = 0 ; $i < @bstrings ; $i++ ) {
                for ( my $j = $i + 1 ; $j < @bstrings ; $j++ ) {
                    my $bstring1 = $bstrings[$i];
                    my $bstring2 = $bstrings[$j];
                    my $p01      = unpack q/%32b*/, ~$bstring1 & $bstr
+ing2;
                    my $p10      = unpack q/%32b*/, $bstring1 & ~$bstr
+ing2;
                    my $p11      = unpack q/%32b*/, $bstring1 & $bstri
+ng2;
                    my $p00      = $length - $p01 - $p10 - $p11;
                }
            }
          }

    }
);
[download]

In reply to Re^2: Bitwise operators on binary objects by albert
in thread Bitwise operators on binary objects by albert

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.