At the time of testing, I was hoping to specify the number of threads using parsort but was unable to in a consistent fashion. So, I created a wrapper script named "parallel" that is placed in a path (/usr/local/bin) before (/usr/bin) where "parallel" itself resides.

#!/usr/bin/env bash # Wrapper script for parallel. # Whoa!!! GNU Parallel assumes you want to consume all CPU cores. # Unfortunately, one cannot specify the number of threads for parsort. CMD="/usr/bin/parallel" if [[ -z "$PARALLEL_NUM_THREADS" ]]; then exec "$CMD" "$@" elif [[ "$#" -eq 1 && "$1" == "--number-of-threads" ]]; then echo $PARALLEL_NUM_THREADS; exit 0 elif [[ "$1" == "-j" ]]; then shift; shift; exec "$CMD" -j $PARALLEL_NUM_THREADS "$@" else exec "$CMD" -j $PARALLEL_NUM_THREADS "$@" fi

The environment variable prevents parsort (/usr/bin/parallel, behind the scene) from going semi-wild on a big machine with many CPU cores.

export PARALLEL_NUM_THREADS=12 LC_ALL=C parsort -k1 big{1,2,3}.txt | ./tally-count | LC_ALL=C parsort + -k2nr >out.txt

The GNU Parallel "parsort" / "tally-count" combination may be useful for Chuma in the originating Long list is long thread. Chuma wrote, "the files are pretty large (up to a couple hundred MB), there are quite a few of them (2064)."


In reply to Re^2: Rosetta Code: Long List is Long - GNU Parallel by marioroy
in thread Rosetta Code: Long List is Long by eyepopslikeamosquito

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.