Re(2): some forking help

With a 100Mb file and 50+ strings to search for, there could be some speed advantage to forking separate processes for each search string and letting them run in parallel. Especially, if the regexen are precompiled before forking.

Of course, the sheer simplicity of merlyn's solution probably more than compensates for the overall savings in time through the use of parallelism, when you realize that the tricky task of gathering up the individual counts from each of the child processes is not as straightforward as it may at first glance appear.

dmm

You can give a man a fish and feed him for a day ...
Or, you can teach him to fish and feed him for a lifetime

Comment on Re(2): some forking help

Replies are listed 'Best First'.
Re: Re(2): some forking help by fokat (Deacon) on Dec 25, 2001 at 00:32 UTC
The only way in which a `fork()`ing solution would be faster than the solutions posted so far, would be in a MP machine, where each process could scan the file separatedly. This, assuming that the file fits within the buffer cache. Otherwise, the price of the context switches will make this solution run slower. Just my $0.02 :) Merry Christmas to all the fellow monks!	[reply]
Re(4): some forking help by dmmiller2k (Chaplain) on Dec 25, 2001 at 01:13 UTC
I'm not sure. My gut feeling is that searching a file is fairly I/O bound, and therefore would involve a significant amount of waiting for the disk regardless; why not capitalize on that by waiting in parallel? dmm You can give a man a fish and feed him for a day ... Or, you can teach him to fish and feed him for a lifetime	[reply]