in reply to Re^3: Tallying co-occurence of numbers
in thread Tallying co-occurence of numbers

Quite large.

1st Number: 1-20

2nd + 3rd Number: 1-1,200,000

Running it on a real sample, I get around 3,000,000 unique lines when printing the hash out (all 3 numbers + frequency per line).

Replies are listed 'Best First'.
Re^5: Tallying co-occurence of numbers
by BrowserUk (Patriarch) on Jun 18, 2016 at 10:02 UTC

    You could try packing your numbers into a 64-bit int; it might save some space:

    ++$hash{ pack 'Q', $n_1to20 * 1.2e6**2 + $a_1to1_2e6 * 1.2e6 + $b_1to +1_2e6 };

    It depends on the mix of sizes of the larger numbers. (I'll think on it some more.)

    Also, try pre-extending your hash to 3 million: keys %hash = 3e6;


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
    In the absence of evidence, opinion is indistinguishable from prejudice. Not understood.
Re^5: Tallying co-occurence of numbers
by BrowserUk (Patriarch) on Jun 18, 2016 at 15:51 UTC

    You can gain a tad more by truncating the 64-bit int to 6 bytes, but you're into a world of diminishing returns:

    #! perl -slw use strict; #use Math::Random::MT qw[ rand ]; use Devel::Size qw[ total_size ]; our $S //= 1; srand( $S ); my %hash; for( 1 .. 1e6 ) { my( $x, $y, $z ) = ( int( rand 20 ), int( rand 1.2e6 ), int( rand +1.2e6 ) ); # ++$hash{ $x }{ $y }{ $z }; # ++$hash{ join $;, $x, $y, $z }; # ++$hash{ pack 'Q', $x * 1.2e6**2 + $y * 1.2e6 + $z }; ++$hash{ unpack 'A6', pack 'Q', $x * 1.2e6**2 + $y * 1.2e6 + $z }; } print total_size( \%hash ), ' ', scalar keys %hash; __END__ ++$hash{ $x }{ $y }{ $z }; + 269 897 378 ++$hash{ join $;, $x, $y, $z }; + 106 036 953 40% ++$hash{ pack 'Q', $x * 1.2e6**2 + $y * 1.2e6 + $z }; + 98 388 672 36.5% ++$hash{ unpack 'A6', pack 'Q', $x * 1.2e6**2 + $y * 1.2e6 + $z }; + 96 193 539 35.6%

    Beyond that, if it is still a problem, print the triplets to stdout and pipe the results through your system sort and then into a another perl script that counts them:

    C:\test>perl -E"say join ' ', int( rand 20 ), sort{ $a<=>$b } int( ran +d 1.2e6 ), int( rand 1.2e6 ) for 1 .. 1e6" | sort | perl -nle"if($last eq $_){ ++$n }else{ print qq[$last : $n];$n=1} $l +ast=$_" | wc -l 999978

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
    In the absence of evidence, opinion is indistinguishable from prejudice. Not understood.