in reply to Forking Multiple Regex's on a Single String

First of all if all children are writing to the same filehandle, you will get messy data.

But more importantly than that, you're forking at the wrong level of granularity. Forking involves overhead, so you want to have multiple long-lived copies of your program doing something. Furthermore it is not obvious to me whether your code will be bound by CPU or bound by I/O - but forking can work to your benefit either way.

If it takes a noticable amount time to process a file then my suggestion would be to use something like Parallel::ForkManager to fork off one job per file, and to keep a fixed number of jobs going at once. Have each output file written to another directory, and then reassemble the output into a single file later.

If individual files are virtually instantaneous to process but you have a lot of files, then you'll want to divide the list of files between copies of the program.

All of this advice, of course, is predicated on the assumption that you are using an OS where forking does something useful for you. Which means that I hope you're using something other than Windows.

  • Comment on Re: Forking Multiple Regex's on a Single String

Replies are listed 'Best First'.
Re^2: Forking Multiple Regex's on a Single String
by GrandFather (Saint) on Aug 19, 2006 at 22:22 UTC

    Parallel::ForkManager is available through ppm so someone considers it sufficiently useful under Windows to have gone to the effort of adding it to an ActiveState repository.


    DWIM is Perl's answer to Gödel
      My understanding is that it kind of works, but fork is very heavyweight on Windows. And is implemented using threading, which may reduce how much the scheduler will want to schedule it on multiple CPUs. Therefore I wouldn't expect the same performance benefits from fork under Windows that I would on Unix, Linux, or OS X.

      Then again, as always, this is dated information from someone who doesn't use Windows. This may have changed, improved, etc. But if it has, I am not aware of it.

        I don't know how the module may be implemented on Windows. However I do know that Windows is happy to allocate threads for the same process to different CPUs. There are other issues with fork under Windows that may incure a startup cost per fork, but shouldn't make much difference during the subsequent execution of the forked process.


        DWIM is Perl's answer to Gödel