in reply to What is the most memory efficient way to (sort) and print a hash?

Greetings, a_salway! This is 10 years late! Recently, chuma requested similarly the Long List is Long challenge. For testing, I modified eyepopslikeamosquito's file generator to emit 16 million unique key-value pairs, in parallel. The key-names length are 8 chars minimum ~ 19 chars maximum, sufficient for uniqueness.

use v5.30; use MCE; { # Generate a random word, inspired by eyepopslikeamosquito's # gen-llil.pl generator https://perlmonks.org/?node_id=11148681 my $ordmin = ord('A'); my $ordmax = ord('z') + 1; sub gen_random_word { my $nchar = shift; # the number of random chars to append my $word = ""; for (1 .. $nchar) { my $ord = $ordmin + int( rand($ordmax - $ordmin) ); # try again if 91 [, 92 \, 93 ], 94 ^, 95 _, or 96 ` while ($ord > 90 && $ord < 97) { $ord = $ordmin + int( rand($ordmax - $ordmin) ); } $word .= chr($ord); } return $word; } } srand(42); MCE->new( max_workers => MCE::Util::get_ncpu(), sequence => [ 1, 16e6 ], bounds_only => 1, chunk_size => 8e3, init_relay => 1, posix_exit => 1, user_func => sub { my ($mce, $chunk_ref, $chunk_id) = @_; my $seq_beg = $chunk_ref->[0]; my $seq_end = $chunk_ref->[1]; my $out = ""; # Worker seeds generator using internal seed and wid value. # Compute similarly using the chunk_id value. my $seed = abs(MCE->seed - ($chunk_id * 1e5)) % 1073741780; srand($seed); for ($seq_beg .. $seq_end) { $out .= gen_random_word(int(rand(12) + 8)); $out .= "\t" . int(rand(1e4) + 1) . "\n"; } # Write output directly and orderly via relay. MCE::relay { print $out }; } )->run;

I tried three programs for sorting the output similarly to the Long List is Long challenge. That is count descending and key names ascending. Parsort is a program included with GNU parallel. Another variant is mcesort using mini-MCE integrated into the code.

Testing was captured on an AMD Ryzen Threadripper 3970X machine, 3600 MHz DDR4. The output file md5sum is c6da8fd098664155c190782f1dc0ca7f.

$ rm -f out; time perl demo.pl | LC_ALL=C sort -k2rn > out real 0m34.779s user 1m39.925s sys 0m00.555s $ rm -f out; time perl demo.pl | LC_ALL=C parsort -k2rn > out real 0m04.484s user 1m17.538s sys 0m00.575s $ rm -f out; time perl demo.pl | LC_ALL=C mcesort -k2rn > out real 0m04.287s user 1m54.859s sys 0m01.748s

Next, a C++ demonstration running llil4map_buf2.cc. Set SSO_LIMIT to 20 for best performance, ~ line 131. Key names plus null character greater than SSO_LIMIT are dynamically allocated using blocks of memory.

$ time perl demo.pl > out.in real 0m01.396s user 1m16.659s sys 0m00.287s $ rm -f out; ./llil4map2 out.in > out llil4map start use OpenMP SSO_LIMIT 12 SSO_LIMIT 20 use boost sort get properties 0.166 secs 0.159 secs map to vector 0.055 secs 0.077 secs vector stable sort 0.201 secs 0.113 secs write stdout 0.097 secs 0.068 secs total time 0.521 secs 0.419 secs count lines 16000000 count unique 16000000