Having the parent read in the data and hand off each piece to the appropriate thread(s) (I'm guessing via Thread::Queue might be a good way) is the most general method that springs to my mind.
Something like this maybe?:
Problem: That will take ~4 hours to process a 3.5 GB file. And that's with the output redirected to nul so there is no competition for the disk head.
I'd probably do something similar except using processes and simple pipes, as I've often done.
So something like this, but using processes instead of threads perhaps?:
This fares better and only takes ~15 minutes to process the 3.5GB.
But all that effort is for naught as this:
C:\test>perl -pe"s[a][A]" phrases.txt >nul
processes the same 3.5GB in exactly the same way, but in less than 2 minutes.
Now the "processing" in all these examples is pretty light, just a single pass of each record, but it serves to highlight the scale of the overhead involved in distributing the records. And the scale that the processing would need to involve to make either of these distribution mechanisms viable.
The pipes mechanism is more efficient than the shared queues, and should be pretty much the same between processes as it is between threads, so I doubt there is much to be gained by going that route.
Maybe you have some mechanism in mind that will radically alter the overheads equation, but there is an awful lot of in-the-mind's-eye-expertise around here, and having spent a long time trying to find a good way to do this very thing, I'd really like to be educated by someone who's actually made it work.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
|