Re: How do you parallelize STDIN for large file processing?

One way to handle the line ordering is to have each worker send its output to a different file. Have each line preceded with the line number where it resided in the original file. Then just merge the files:

sort  --field-separator=,
      --key=1,1
      --numeric-sort
      --merge
      worker_output_*
| cut --delimiter=,
      --fields=2-
[download]

--merge is key.

Comment on Re: How do you parallelize STDIN for large file processing? Select or Download Code

Replies are listed 'Best First'.
Re^2: How do you parallelize STDIN for large file processing? by jandrew (Chaplain) on Feb 06, 2009 at 05:28 UTC
Would Tie::File be helpful? if you knew which line it came from you could slot right back in where it started. tie @array, 'Tie::File', filename or die ...; ### at the end my ($Toad, $LineIn) = $MagicWand -> HocusPocus $array[$LineIn] = $Toad; http://search.cpan.org/~mjd/Tie-File-0.96/lib/Tie/File.pm	[reply]
Re^3: How do you parallelize STDIN for large file processing? by ikegami (Patriarch) on Feb 06, 2009 at 05:32 UTC
Definitely not. The goal is to speed things up, not set the hard drive on fire.	[reply]