comment on

I have a series of strings (all of equal length) which contain only 0s and 1s, such as:

111011001010011110100010100001
111010000010010110000010000001
000101011100001000110101110000
000101111101001001111101111110
111011001010111110100010100001
000100010100000000010001010000
[download]

For each unique pairs of strings, I want to count the number of 00, 01, 10, and 11 as you move each character for the pair. (In the example above, for the first two strings, there are 15 of '00', 0 of '01', 5 of '10', and 10 of '11'.)

Since I want to look at all the pairs amongst 1000s of these strings, speed is of the essence. I am currently doing the following, but appreciate any suggestions on alternative strategies which might be faster.

In words:
I put the strings into an array of arrays, where the sub-arrays are composed of the strings split into single characters. Then, I iterate over every pair of elements in the array, taking the pair of sub-arrays, and iterating over those to count the 00, 01, 10, and 11.

In code:

my @strings = qw/111011001010011110100010100001
  111010000010010110000010000001
  000101011100001000110101110000
  000101111101001001111101111110
  111011001010111110100010100001
  000100010100000000010001010000/;

foreach my $string (@strings) {
    my @items = split //, $string;
    $string = \@items;
}

for ( my $i = 0 ; $i < @strings ; $i++ ) {
    for ( my $j = $i + 1 ; $j < @strings ; $j++ ) {
        my ( $c00, $c01, $c10, $c11 ) = ( 0, 0, 0, 0 );
        for ( my $k = 0 ; $k < @{ $strings[$i] } ; $k++ ) {
            $c00++
              if ${$strings[$i]}[$k] == 0 && ${$strings[$j]}[$k] == 0;
            $c01++
              if ${$strings[$i]}[$k] == 0 && ${$strings[$j]}[$k] == 1;
            $c10++
              if ${$strings[$i]}[$k] == 1 && ${$strings[$j]}[$k] == 0;
            $c11++
              if ${$strings[$i]}[$k] == 1 && ${$strings[$j]}[$k] == 1;
        }
        print join( "\t", $i, $j, $c00, $c01, $c10, $c11 ), "\n";
    }
}
[download]

Since I have many 1000s of these strings to analyze in unique pairs, speed is of importance. (For reference, my real world strings are between 120 and 180 characters in length.) Therefore, does any wise monk have a suggestion on ways I might speed this up. Or, can someone reassure me that I can't do much better than this.

Thanks wise monks,
-albert

In reply to Speeding permutation counting by albert

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.