Re: Comparing two arrays and counting pairwise comparisons

Your description is ambiguous. With the second string all the same char we have to infer either you want

char1 str1 against char1 str2, char2 str1 against char2 str2 or you want all the permutaions ie
char1 str1 against char(1..n) str2, char2 str1 against char(1..n) str2....

I am assuming you want all the permutations. There are two general approaches. Brute force and the more clever approach. Brute force won't scale well as it is O(n^2). The clever approach is 0(n) plus an O(n^2) whisker that represents the number of possible tokens. As you can see both approaches yield the same results. Assuming those are the results you wanted ;-)

use Data::Dumper;

my $site1 = 'AATKKM' x 20;
my $site2 = 'GGGGGA' x 20;

# brute force
my (%hash, $loops);
for my $base1( split //, $site1 ) {
    for my $base2( split //, $site2 ) {
        $hash{"${base1}_$base2"}++;
        $loops++;
    }
}

# precount incidence of tokens
my (%s1, %s2, %hash_e, $loops_e);
do{ $s1{$_}++; $loops_e++ } for split //, $site1;
do{ $s2{$_}++; $loops_e++ } for split //, $site2;

# still loop within loop but the are now far fewer loops to do
# as we only do one per token pair and calculate the total pairs
# from our precount data
for my $base1( keys %s1 ) {
    for my $base2( keys %s2 ) {
        $hash_e{"${base1}_$base2"} = $s1{$base1}*$s2{$base2};
        $loops_e++;
    }
}
print "Brute force loops $loops\nEfficeient Loops  $loops_e\n\n";
print Data::Dumper->Dump([\%hash, \%s1, \%s2, \%hash_e], [qw( hash s1 
+s2 hash_e)] );

__DATA__
Brute force loops 14400
Efficeient Loops  248

$hash = {
          'T_G' => '2000',
          'A_A' => '800',
          'T_A' => '400',
          'M_G' => '2000',
          'M_A' => '400',
          'K_G' => '4000',
          'K_A' => '800',
          'A_G' => '4000'
        };
$s1 = {
        'A' => '40',
        'K' => '40',
        'T' => '20',
        'M' => '20'
      };
$s2 = {
        'G' => '100',
        'A' => '20'
      };
$hash_e = {
            'T_G' => '2000',
            'A_A' => '800',
            'T_A' => '400',
            'M_G' => '2000',
            'M_A' => '400',
            'K_G' => '4000',
            'K_A' => '800',
            'A_G' => '4000'
          };
[download]

cheers

tachyon

Comment on Re: Comparing two arrays and counting pairwise comparisons Download Code

Replies are listed 'Best First'.
Re^2: Comparing two arrays and counting pairwise comparisons by BrowserUk (Patriarch) on Oct 19, 2004 at 01:34 UTC
(From memory!) Your algorithm is O(N^2) but mine is O(N) True enough, but which N? I don't think your code comes close to answering the question. Your treating all the strings in each array as a single concatenated string. I'm not, and I don't believe that's what the OP wants. Using your algorithm on my test data produces this: Read more... (3 kB) Which is solving a totally different problem to the one I belive the OP wants to solve. Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "Think for yourself!" - Abigail "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon	[reply] [d/l]

Replies are listed 'Best First'.

Re^2: Comparing two arrays and counting pairwise comparisons
by BrowserUk (Patriarch) on Oct 19, 2004 at 01:34 UTC

(From memory!)

Your algorithm is O(N^2) but mine is O(N)

True enough, but which N?

I don't think your code comes close to answering the question. Your treating all the strings in each array as a single concatenated string. I'm not, and I don't believe that's what the OP wants.

Using your algorithm on my test data produces this: