in reply to Tallying co-occurence of numbers

And then thought to use a multi-dimensional hash:

If you have a very large number of lines, using multi-dimensional hashes can consume prodigious amount of memory.

You can achieve the same thing using a single level hash by combining the values into a composite key.

I don't quite follow your criteria for producing your output from the input, but basically rather than the multi-level hash you've shown, do:

my %count; ++$count{ join $;, sort{ $a <=> %b } $F2[1], $F2[2], $F3[2] };

$; Is a global who values is a control character that won't show up in normal text; and by sorting your values (you may only want to sort two of them rather than all three) you deal with the reversed duplicates problem.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
In the absence of evidence, opinion is indistinguishable from prejudice. Not understood.

Replies are listed 'Best First'.
Re^2: Tallying co-occurence of numbers
by K_Edw (Beadle) on Jun 17, 2016 at 20:02 UTC
    I tried this but it seems to use up around the same amount of memory as a multi-dimensional array (both are very high). Is there a way to do this without having to store a large hash?

      What are the ranges of your 3 numbers?


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
      In the absence of evidence, opinion is indistinguishable from prejudice. Not understood.

        Quite large.

        1st Number: 1-20

        2nd + 3rd Number: 1-1,200,000

        Running it on a real sample, I get around 3,000,000 unique lines when printing the hash out (all 3 numbers + frequency per line).