Tallying co-occurence of numbers

K_Edw has asked for the wisdom of the Perl Monks concerning the following question:

Given a tab-delimited .txt file where data comes in pairs of lines, such as:

1    7    848773    75    A    74
1    7    848576    74    A    0
2    16    785802    75    A    0
2    16    786009    75    A    74
3    7    848576    75    A    74
3    7    848773    74    A    0
[download]

I wish to tally the frequency of all unique number combinations - in the above example the desired output is:

7    848576    848773    2
16  785802    786009    1
[download]

848773-848576 being the same as 848576-848773.

I'm not sure how to solve this problem. I'm accessing each pair of lines via:

    my @F2 = split("\t", $_);
    my $partner = <$IN2>;
    my @F3 = split("\t", $partner);
[download]

And then thought to use a multi-dimensional hash:

my %count;
$count{$F2[1]}{$F2[2]}{$F3[2]}++;
[download]

However this would not handle equivalent pairs - which is the part i'm not sure how to do or how I would then print it out easily. What would be a good way to do it?

Comment on Tallying co-occurence of numbers Select or Download Code

Replies are listed 'Best First'.
Re: Tallying co-occurence of numbers by choroba (Cardinal) on Jun 17, 2016 at 13:32 UTC
If the order of the pair is not important, always store the lesser number first. `#!/usr/bin/perl use warnings; use strict; my %count; while (<>) { my @F1 = split /\t/; my @F2 = split /\t/, <>; my ($num, $from, $to) = (@F1[ 1, 2 ], $F2[2] ); ($from, $to) = ($to, $from) if $from > $to; ++$count{$num}{$from}{$to}; }` [download] ($q=q:Sq=~/;[c](.)(.)/;chr(-\|\|-\|5+lengthSq)`"S\|oS2"`map{chr \|+ord }map{substrSq`S_+\|`\|}3E\|-\|`7**2-3:)=~y+S\|`+$1,++print+eval$q,q,a, [download]	[reply] [d/l] [select]
Re^2: Tallying co-occurence of numbers by K_Edw (Beadle) on Jun 17, 2016 at 13:34 UTC
I didn't think of that, thank you.	[reply]
Re: Tallying co-occurence of numbers by BrowserUk (Patriarch) on Jun 17, 2016 at 14:22 UTC
And then thought to use a multi-dimensional hash: If you have a very large number of lines, using multi-dimensional hashes can consume prodigious amount of memory. You can achieve the same thing using a single level hash by combining the values into a composite key. I don't quite follow your criteria for producing your output from the input, but basically rather than the multi-level hash you've shown, do: `my %count; ++$count{ join $;, sort{ $a <=> %b } $F2[1], $F2[2], $F3[2] };` [download] `$;` Is a global who values is a control character that won't show up in normal text; and by sorting your values (you may only want to sort two of them rather than all three) you deal with the reversed duplicates problem. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :) In the absence of evidence, opinion is indistinguishable from prejudice. Not understood.	[reply] [d/l] [select]
Re^2: Tallying co-occurence of numbers by K_Edw (Beadle) on Jun 17, 2016 at 20:02 UTC
I tried this but it seems to use up around the same amount of memory as a multi-dimensional array (both are very high). Is there a way to do this without having to store a large hash?	[reply]
Re^3: Tallying co-occurence of numbers by BrowserUk (Patriarch) on Jun 17, 2016 at 20:28 UTC
What are the ranges of your 3 numbers? With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :) In the absence of evidence, opinion is indistinguishable from prejudice. Not understood.	[reply]
Re^4: Tallying co-occurence of numbers by K_Edw (Beadle) on Jun 18, 2016 at 09:38 UTC
Re^5: Tallying co-occurence of numbers by BrowserUk (Patriarch) on Jun 18, 2016 at 10:02 UTC
Re^5: Tallying co-occurence of numbers by BrowserUk (Patriarch) on Jun 18, 2016 at 15:51 UTC