in reply to How do you parallelize STDIN for large file processing?

Perhaps I'm missing something. If it's a static file (that is, not an input stream) you can start multiple worker processes each with a different byte offset into the file. Since you've established that it's newline-based, you can skip bytes until after the first newline (except for the worker at offset 0) to guarantee that you're processing a line at a time. Then your workers each build a "chapter" of output which you can then cat together later. These workers can be processes or threads. "Simple" seeks get you to the offset.

The reason I say this is that it's not clear if accepting pipe-ish data (STDIN) is part of the problem statement or part of the solution (a design approach). If the former...nevermind.

  • Comment on Re: How do you parallelize STDIN for large file processing?