I also ran parallel using MCE; 7 workers, each processing a range of characters.

#!/usr/bin/env perl # https://perlmonks.org/?node_id=11148669 use warnings; use strict; use Judy::HS qw/ Set Get Free /; use Sort::Packed 'sort_packed'; use MCE; my $DATA_TEMPLATE = 'nZ10'; my $DATA_SIZE = 12; my $COUNT_SIZE_BYTES = 2; my $COUNT_SIZE_BITS = 16; my $COUNT_MAX = int( 2 ** $COUNT_SIZE_BITS - 1 ); @ARGV or die "usage: $0 file...\n"; my @llil_files = @ARGV; for (@llil_files) { die "Cannot open '$_'" unless -r "$_"; } # MCE gather and parallel routines. my $DATA = ''; sub gather_routine { $DATA .= $_[0]; } sub parallel_routine { my $char_range = $_; my ( $data, $current, $judy ) = ( '', 0 ); for my $fname (@llil_files) { open( my $fh, '<', $fname ) or die $!; while ( <$fh> ) { if (/^[${char_range}]/) { chomp; my ( $word, $count ) = split /\t/; ( undef, my $val ) = Get( $judy, $word ); if ( defined $val ) { vec( $data, $val * $DATA_SIZE / $COUNT_SIZE_BYTES, $COUNT_SIZE_BITS ) -= $count } else { $data .= pack $DATA_TEMPLATE, $COUNT_MAX - $count, + $word; Set( $judy, $word, $current ); $current ++ } } } close $fh; } Free( $judy ); MCE->gather( $data ); } # Run parallel using MCE. warn "my_test start\n"; my $tstart1 = time; MCE->new( input_data => ['a-d','e-h','i-l','m-p','q-t','u-x','y-z'], max_workers => 7, chunk_size => 1, posix_exit => 1, gather => \&gather_routine, user_func => \&parallel_routine, use_threads => 0, )->run(1); my $tend1 = time; warn "get_properties : ", $tend1 - $tstart1, " secs\n"; my $tstart2 = time; sort_packed "C$DATA_SIZE", $DATA; $| = 0; # enable output buffering while ( $DATA ) { my ( $count, $word ) = unpack $DATA_TEMPLATE, substr $DATA, 0, $DA +TA_SIZE, ''; printf "%s\t%d\n", $word, $COUNT_MAX - $count } my $tend2 = time; warn "sort + output : ", $tend2 - $tstart2, " secs\n"; warn "total : ", $tend2 - $tstart1, " secs\n"; __END__ $ time perl mce_judyhs.pl big1.txt big2.txt big3.txt >out3.txt my_test start get_properties : 5 secs sort + output : 5 secs total : 10 secs real 0m9.794s user 0m35.719s sys 0m0.257s

In reply to Re^4: Rosetta Code: Long List is Long -- Parallel by marioroy
in thread Rosetta Code: Long List is Long by eyepopslikeamosquito

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.