in reply to Re^3: Parallel processing on Windows
in thread Parallel processing on Windows

I don't really like the other explanations here, so let me try too:

Most people think of "threads" as additional execution points running around in the same code and same data as eachother. Perl does not offer that option. And, actually I'm glad it doesn't, because in the Java and C++ I've written that does true "threading" it is extremely easy to introduce bugs when touching the same data structures. Getting "threading" right is massively complicated and requires rigorous design principles and IMHO has no place in a quick-and-easy scripting language.

What Perl does offer as "ithreads" is a lot more like fork/wait. When you start an ithread, it clones the current perl program (but within the existing address space, creating a new parallel interpreter for the clone), executes in parallel, and then passes data back to the main program. You can do the same thing by creating a pipe, forking, running things, and writing the result through a pipe to the parent. ithreads make this convenient; but there are also perl modules that make fork/serialize/wait convenient.

So, what are the decision points for choosing ithreads vs. fork/wait?

Summing it up,

  1. If you are on Windows and your program will only ever run on Windows, you might as well use ithreads because they are simpler than fork() and give the same result.
  2. If you are on Linux, compile your perl without ithreads so that it runs faster, and use perl modules to make forking/collecting data easier.
  3. If you want your program to run in multiple environments, use the forking perl modules, because not all perls have ithreads enabled, and the special modules usually do something more efficient than "clone everything" when starting a new worker.

Replies are listed 'Best First'.
Re^5: Parallel processing on Windows
by hippo (Archbishop) on Sep 21, 2022 at 09:19 UTC

    That's a nice overview (++). There are a couple of statements presented as facts which on inspection seem not to be.

    However, usually result data is small compared to input data

    I would certainly agree with "sometimes", but "usually" without any citation seems just to be an opinion. Perhaps the problem space in which you most work has such a feature but it would be surprising to find it to be universally (or even broadly) true.

    Linux users prefer ithreads to be disabled for the speed boost.

    Linux users who care about the speed boost at the expense of flexibility prefer ithreads to be disabled for the speed boost. The rest of us don't.

    I'm a Linux user and am quite happy to use threads. The interface is pretty slick and for some scenarios, threads are a perfect fit. In others, forked processes are more appropriate and in those scenarios I'm happy enough to use fork instead. Horses for courses.


    🦛