K_Edw has asked for the wisdom of the Perl Monks concerning the following question:

Given a tab-delimited .txt file where data comes in pairs of lines, such as:

1 7 848773 75 A 74 1 7 848576 74 A 0 2 16 785802 75 A 0 2 16 786009 75 A 74 3 7 848576 75 A 74 3 7 848773 74 A 0

I wish to tally the frequency of all unique number combinations - in the above example the desired output is:

7 848576 848773 2 16 785802 786009 1

848773-848576 being the same as 848576-848773.

I'm not sure how to solve this problem. I'm accessing each pair of lines via:
my @F2 = split("\t", $_); my $partner = <$IN2>; my @F3 = split("\t", $partner);

And then thought to use a multi-dimensional hash:

my %count; $count{$F2[1]}{$F2[2]}{$F3[2]}++;
However this would not handle equivalent pairs - which is the part i'm not sure how to do or how I would then print it out easily. What would be a good way to do it?

Replies are listed 'Best First'.
Re: Tallying co-occurence of numbers
by choroba (Cardinal) on Jun 17, 2016 at 13:32 UTC
    If the order of the pair is not important, always store the lesser number first.
    #!/usr/bin/perl use warnings; use strict; my %count; while (<>) { my @F1 = split /\t/; my @F2 = split /\t/, <>; my ($num, $from, $to) = (@F1[ 1, 2 ], $F2[2] ); ($from, $to) = ($to, $from) if $from > $to; ++$count{$num}{$from}{$to}; }

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
      I didn't think of that, thank you.
Re: Tallying co-occurence of numbers
by BrowserUk (Patriarch) on Jun 17, 2016 at 14:22 UTC
    And then thought to use a multi-dimensional hash:

    If you have a very large number of lines, using multi-dimensional hashes can consume prodigious amount of memory.

    You can achieve the same thing using a single level hash by combining the values into a composite key.

    I don't quite follow your criteria for producing your output from the input, but basically rather than the multi-level hash you've shown, do:

    my %count; ++$count{ join $;, sort{ $a <=> %b } $F2[1], $F2[2], $F3[2] };

    $; Is a global who values is a control character that won't show up in normal text; and by sorting your values (you may only want to sort two of them rather than all three) you deal with the reversed duplicates problem.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
    In the absence of evidence, opinion is indistinguishable from prejudice. Not understood.
      I tried this but it seems to use up around the same amount of memory as a multi-dimensional array (both are very high). Is there a way to do this without having to store a large hash?

        What are the ranges of your 3 numbers?


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
        In the absence of evidence, opinion is indistinguishable from prejudice. Not understood.