in reply to How do you parallelize STDIN for large file processing?

One way to handle the line ordering is to have each worker send its output to a different file. Have each line preceded with the line number where it resided in the original file. Then just merge the files:
sort --field-separator=, --key=1,1 --numeric-sort --merge worker_output_* | cut --delimiter=, --fields=2-

--merge is key.

Replies are listed 'Best First'.
Re^2: How do you parallelize STDIN for large file processing?
by jandrew (Chaplain) on Feb 06, 2009 at 05:28 UTC
    Would Tie::File be helpful?
    if you knew which line it came from you could slot right back in where it started.

    tie @array, 'Tie::File', filename or die ...;

    ### at the end
    my ($Toad, $LineIn) = $MagicWand -> HocusPocus
    $array[$LineIn] = $Toad;
    http://search.cpan.org/~mjd/Tie-File-0.96/lib/Tie/File.pm
      Definitely not. The goal is to speed things up, not set the hard drive on fire.