To avoid confusion between the combinations ("A","B") and ("B","A") you need to "normalize" i.e. in this case to sort the key "vector" into a unique "representative" (here ("A","B")).
Now you can either use a HoH $h{A}{B}++ or use this old multi-dim feature from Perl4 by separting keys by comma $h{A,B}++!
I prefer the latter.
Code example to follow...¹
UPDATE: IIRC comma as separator is more secure than using an arbitrary separator like "_" but since I can't find the docs for multi dim keys and² your input already uses space as separator, you should stick with it. I.e. changing the code below to $count{"@keys[$a,$b]"}++
{ "aesG gly4" => 2, "aesG phil" => 2, "aesG tomD" => 2, "gly4 phil" => 2, "gly4 tomD" => 3, "phil tomD" => 2, }
Cheers Rolf
1)
use Data::Dump "pp"; my %count; while(<DATA>) { chomp; @keys= sort split; for my $a (0..$#keys){ for my $b ($a+1.. $#keys) { # pp [$keys[$a],$keys[$b]]; $count{$keys[$a],$keys[$b]}++ } } } pp \%count; __DATA__ tomD gly4 phil aesG tomD gly4 phil aesG phil aesG tomD gly4
prints
{ "aesG\34gly4" => 2, "aesG\34phil" => 2, "aesG\34tomD" => 2, "gly4\34phil" => 2, "gly4\34tomD" => 3, "phil\34tomD" => 2, }
2) see $;
In reply to Re: counting pairwise incidences
by LanX
in thread counting pairwise incidences
by reubs85
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |