Greetings, a_salway! This is 10 years late! Recently, chuma requested similarly the Long List is Long challenge. For testing, I modified eyepopslikeamosquito's file generator to emit 16 million unique key-value pairs, in parallel. The key-names length are 8 chars minimum ~ 19 chars maximum, sufficient for uniqueness.

use v5.30; use MCE; { # Generate a random word, inspired by eyepopslikeamosquito's # gen-llil.pl generator https://perlmonks.org/?node_id=11148681 my $ordmin = ord('A'); my $ordmax = ord('z') + 1; sub gen_random_word { my $nchar = shift; # the number of random chars to append my $word = ""; for (1 .. $nchar) { my $ord = $ordmin + int( rand($ordmax - $ordmin) ); # try again if 91 [, 92 \, 93 ], 94 ^, 95 _, or 96 ` while ($ord > 90 && $ord < 97) { $ord = $ordmin + int( rand($ordmax - $ordmin) ); } $word .= chr($ord); } return $word; } } srand(42); MCE->new( max_workers => MCE::Util::get_ncpu(), sequence => [ 1, 16e6 ], bounds_only => 1, chunk_size => 8e3, init_relay => 1, posix_exit => 1, user_func => sub { my ($mce, $chunk_ref, $chunk_id) = @_; my $seq_beg = $chunk_ref->[0]; my $seq_end = $chunk_ref->[1]; my $out = ""; # Worker seeds generator using internal seed and wid value. # Compute similarly using the chunk_id value. my $seed = abs(MCE->seed - ($chunk_id * 1e5)) % 1073741780; srand($seed); for ($seq_beg .. $seq_end) { $out .= gen_random_word(int(rand(12) + 8)); $out .= "\t" . int(rand(1e4) + 1) . "\n"; } # Write output directly and orderly via relay. MCE::relay { print $out }; } )->run;

I tried three programs for sorting the output similarly to the Long List is Long challenge. That is count descending and key names ascending. Parsort is a program included with GNU parallel. Another variant is mcesort using mini-MCE integrated into the code.

Testing was captured on an AMD Ryzen Threadripper 3970X machine, 3600 MHz DDR4. The output file md5sum is c6da8fd098664155c190782f1dc0ca7f.

$ rm -f out; time perl demo.pl | LC_ALL=C sort -k2rn > out real 0m34.779s user 1m39.925s sys 0m00.555s $ rm -f out; time perl demo.pl | LC_ALL=C parsort -k2rn > out real 0m04.484s user 1m17.538s sys 0m00.575s $ rm -f out; time perl demo.pl | LC_ALL=C mcesort -k2rn > out real 0m04.287s user 1m54.859s sys 0m01.748s

Next, a C++ demonstration running llil4map_buf2.cc. Set SSO_LIMIT to 20 for best performance, ~ line 131. Key names plus null character greater than SSO_LIMIT are dynamically allocated using blocks of memory.

$ time perl demo.pl > out.in real 0m01.396s user 1m16.659s sys 0m00.287s $ rm -f out; ./llil4map2 out.in > out llil4map start use OpenMP SSO_LIMIT 12 SSO_LIMIT 20 use boost sort get properties 0.166 secs 0.159 secs map to vector 0.055 secs 0.077 secs vector stable sort 0.201 secs 0.113 secs write stdout 0.097 secs 0.068 secs total time 0.521 secs 0.419 secs count lines 16000000 count unique 16000000

In reply to Re: What is the most memory efficient way to (sort) and print a hash? by marioroy
in thread What is the most memory efficient way to (sort) and print a hash? by a_salway

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.