in reply to Re^2: a bit more help with pairwise comparisons between strings in arrays.
in thread a bit more help with pairwise comparisons between strings in arrays.

You really should try to answer your questions before asking them.

The simple way is to sort the two letters before concatenating them and using the result as the index:

Add
sub get_index { $_[0] lt $_[1] ? ($_[0].$_[1]) : ($_[1].$_[0]) }
and replace
$counts{$site1_parts[$i].$site2_parts[$i]}++ while ($i--);
with
$counts{get_index($site1_parts[$i], $site2_parts[$i])}++ while ($i--);

my @site1 = qw( AGTTTT );
my @site2 = qw( GAKKHT );
gives
AG: 2, HT: 1, KT: 2, TT: 1

Replies are listed 'Best First'.
Re^4: a bit more help with pairwise comparisons between strings in arrays.
by replicant4 (Novice) on Oct 21, 2004 at 15:39 UTC
    thanks again. I am a beginner at perl so I 've been trying to answer my questons, but it was no good. I have another one for you, which might be a bit trickier. I need to calculate the frequencies by which specific combinations appear. So after I do the pairwise combinations and store them into the @count _collection(that is all the possible times that specific combinations appear, then I have to do that for individual sites and calculate the frequency. In the code that is if a divide a specific counter (say counterAG) from %counts with how many times it appears in the AoH. That is I need to calulate countAG/total_count_AG. I don't know if this is possible, I suspect that since the keys of the hashes are similar, it can be done, but I am not sure how. Once again thanks for your patience and your time.

      I renamed some of the variables to more appropriate names.

      use strict; use warnings; sub get_index { $_[0] lt $_[1] ? ($_[0].$_[1]) : ($_[1].$_[0]) } my @site1 = qw( AATKKM AKTKKM ); my @site2 = qw( GGGGGG HHHHHH ); my %pair_info; my @pair_info_by_site; { # Count pairs. my $site1; my $site2; foreach $site1 (@site1) { my @site1_parts = split(//, $site1); my %pair_info_this_site; foreach $site2 (@site2) { my @site2_parts = split(//, $site2); my $i = @site1_parts; while ($i--) { # Sort the letters of the pair. my $pair = get_index($site1_parts[$i], $site2_parts[$i]); # Add to pair count. $pair_info{$pair}++; # Add to pair count for this site. $pair_info_this_site{$pair}[0]++; } } push(@pair_info_by_site, \%pair_info_this_site); } } { # Calculate frequencies. my $pair; my $site; foreach $pair (keys(%pair_info)) { my $pair_count = $pair_info{$pair}; foreach $site (@pair_info_by_site) { my $site_pair_count = $$site{$pair}[0]; # Calculate the frequency for this site. $$site{$pair}[1] = $site_pair_count / $pair_count; } } } { # Output everything. my $pair; my $site; print("Totals$/"); print("======$/"); foreach $pair (sort keys %pair_info) { printf("%s: count = %2d$/", $pair, $pair_info{$pair}); } print($/); print("By Site$/"); print("=======$/"); foreach $site (@pair_info_by_site) { foreach $pair (sort keys %pair_info) { printf("%s: count = %2d freq = %5.3f$/", $pair, @{$$site{$p +air}}); } print($/); } } __END__ Totals ====== AG: count = 3 AH: count = 3 GK: count = 5 GM: count = 2 GT: count = 2 HK: count = 5 HM: count = 2 HT: count = 2 By Site ======= AG: count = 2 freq = 0.667 AH: count = 2 freq = 0.667 GK: count = 2 freq = 0.400 GM: count = 1 freq = 0.500 GT: count = 1 freq = 0.500 HK: count = 2 freq = 0.400 HM: count = 1 freq = 0.500 HT: count = 1 freq = 0.500 AG: count = 1 freq = 0.333 AH: count = 1 freq = 0.333 GK: count = 3 freq = 0.600 GM: count = 1 freq = 0.500 GT: count = 1 freq = 0.500 HK: count = 3 freq = 0.600 HM: count = 1 freq = 0.500 HT: count = 1 freq = 0.500