Re^4: Parallel processing on Windows

I don't really like the other explanations here, so let me try too:

Most people think of "threads" as additional execution points running around in the same code and same data as eachother. Perl does not offer that option. And, actually I'm glad it doesn't, because in the Java and C++ I've written that does true "threading" it is extremely easy to introduce bugs when touching the same data structures. Getting "threading" right is massively complicated and requires rigorous design principles and IMHO has no place in a quick-and-easy scripting language.

What Perl does offer as "ithreads" is a lot more like fork/wait. When you start an ithread, it clones the current perl program (but within the existing address space, creating a new parallel interpreter for the clone), executes in parallel, and then passes data back to the main program. You can do the same thing by creating a pipe, forking, running things, and writing the result through a pipe to the parent. ithreads make this convenient; but there are also perl modules that make fork/serialize/wait convenient.

So, what are the decision points for choosing ithreads vs. fork/wait?

Perl ithreads clone the entire interpreter. Not just the parts you need in the thread, but the whole interpreter, which can be massive if you use big frameworks. On Linux, when you fork, the operation of cloning memory happens lazily on demand.
Using Perl ithreads keeps all the data in the same memory address space, so it is theoretically faster to move results back to the main interpreter. On Linux, with fork, you have to serialize results to bytes, through a pipe, and de-serialize. However, ~~usually~~ in my limited experience, result data is small compared to input data, so Linux fork() probably still wins.
Enabling Perl ithreads in a build of Perl makes the whole interpreter run slower, even when threads aren't used. (for technical reasons) Performance-focused Linux users prefer ithreads to be disabled for the speed boost.
On Windows, fork() *is* an ithread, because Windows doesn't have fork. In fact, this is the only reason ithreads were added to perl, because they already existed to support fake Windows forking.
Windows fork() has bad side-effects that you would not expect if you were familiar with fork from Linux. For instance, file handles are shared between parent and child. If the forked child closes a file handle, the parent loses it too.

Summing it up,

If you are on Windows and your program will only ever run on Windows, you might as well use ithreads because they are simpler than fork() and give the same result.
If you are on Linux, compile your perl without ithreads so that it runs faster, and use perl modules to make forking/collecting data easier.
If you want your program to run in multiple environments, use the forking perl modules, because not all perls have ithreads enabled, and the special modules usually do something more efficient than "clone everything" when starting a new worker.

Comment on Re^4: Parallel processing on Windows

Replies are listed 'Best First'.
Re^5: Parallel processing on Windows by hippo (Archbishop) on Sep 21, 2022 at 09:19 UTC
That's a nice overview (++). There are a couple of statements presented as facts which on inspection seem not to be. However, usually result data is small compared to input data I would certainly agree with "sometimes", but "usually" without any citation seems just to be an opinion. Perhaps the problem space in which you most work has such a feature but it would be surprising to find it to be universally (or even broadly) true. Linux users prefer ithreads to be disabled for the speed boost. Linux users who care about the speed boost at the expense of flexibility prefer ithreads to be disabled for the speed boost. The rest of us don't. I'm a Linux user and am quite happy to use threads. The interface is pretty slick and for some scenarios, threads are a perfect fit. In others, forked processes are more appropriate and in those scenarios I'm happy enough to use fork instead. Horses for courses. 🦛	[reply]

Replies are listed 'Best First'.

Re^5: Parallel processing on Windows
by hippo (Archbishop) on Sep 21, 2022 at 09:19 UTC

That's a nice overview (++). There are a couple of statements presented as facts which on inspection seem not to be.

However, usually result data is small compared to input data

I would certainly agree with "sometimes", but "usually" without any citation seems just to be an opinion. Perhaps the problem space in which you most work has such a feature but it would be surprising to find it to be universally (or even broadly) true.

Linux users prefer ithreads to be disabled for the speed boost.

Linux users who care about the speed boost at the expense of flexibility prefer ithreads to be disabled for the speed boost. The rest of us don't.

I'm a Linux user and am quite happy to use threads. The interface is pretty slick and for some scenarios, threads are a perfect fit. In others, forked processes are more appropriate and in those scenarios I'm happy enough to use fork instead. Horses for courses.

🦛

[reply]