in reply to Re^2: Parallel processing on Windows
in thread Parallel processing on Windows

I am lookgin at the threads modules {threads and threads::shared} and its not encouraging. The threads module comes with the warning
The "interpreter-based threads" provided by Perl are not the fast, lightweight system for multitasking that one might expect or hope for. Threads are implemented in a way that make them easy to misuse. Few people know how to use them correctly or will be able to provide help. The use of interpreter-based threads in perl is officially discouraged.
But the doc doesn't say what you should do about the discouragement. Should I give it a try , or is there yet something else/newer for this? or is this just kinda impossible in Perl on Windows...

Replies are listed 'Best First'.
Re^4: Parallel processing on Windows
by choroba (Cardinal) on Sep 20, 2022 at 19:46 UTC
Re^4: Parallel processing on Windows
by bliako (Abbot) on Sep 20, 2022 at 20:26 UTC

    GNU parallel has never failed me (under Unix, of course). It is a Perl script using threads and Thread::Queue.

    Reading (diagonally) the long discussion cited by choroba as to why the word "discouraged" was used, I did not find real arguments except perhaps that Threads:Shared at some time could not handle the cloning of deep, complex data structures to be shared (as I understand it, now it works) and also that you may not be able to find help.

    For me Corion's Re: Parallel processing on Windows suggestion served me well for all my parallel needs. I have used How to create thread pool of ithreads (the posts by BrowserUK in there) as my starting point.

    There is also marioroy's MCE which I have never used. It looks solid. See Reusable threads demo on how it is used as an alternative to the threads + Thread::Queue paradigm.

    bw bliako

    Edit: Another point in the long discussion mentioned above is performance of a thread-enabled perl and also the overheads of creating a new thread. The latter is mostly irrelevant when you follow the model of a pool of workers (the threads' queue) where a number of threads (workers) are created once and then keep processing your data queue. If you don't keep re-creating threads then this point is irrelevant mostly. Then you have the performance of a perl compiled to enable threads which can be really hindered by the various locks put in place to protect you against race conditions etc. in a potentially threaded environment. That penalty is irrespective of whether you use threads or not, it is whether you want Perl to be able to run threads.

      Reading (diagonally) the long discussion cited by choroba as to why the word "discouraged" was used, I did not find real arguments except perhaps that

      all long discussion now are its just fork users trolling threads users

Re^4: Parallel processing on Windows
by NERDVANA (Priest) on Sep 21, 2022 at 08:19 UTC

    I don't really like the other explanations here, so let me try too:

    Most people think of "threads" as additional execution points running around in the same code and same data as eachother. Perl does not offer that option. And, actually I'm glad it doesn't, because in the Java and C++ I've written that does true "threading" it is extremely easy to introduce bugs when touching the same data structures. Getting "threading" right is massively complicated and requires rigorous design principles and IMHO has no place in a quick-and-easy scripting language.

    What Perl does offer as "ithreads" is a lot more like fork/wait. When you start an ithread, it clones the current perl program (but within the existing address space, creating a new parallel interpreter for the clone), executes in parallel, and then passes data back to the main program. You can do the same thing by creating a pipe, forking, running things, and writing the result through a pipe to the parent. ithreads make this convenient; but there are also perl modules that make fork/serialize/wait convenient.

    So, what are the decision points for choosing ithreads vs. fork/wait?

    • Perl ithreads clone the entire interpreter. Not just the parts you need in the thread, but the whole interpreter, which can be massive if you use big frameworks. On Linux, when you fork, the operation of cloning memory happens lazily on demand.
    • Using Perl ithreads keeps all the data in the same memory address space, so it is theoretically faster to move results back to the main interpreter. On Linux, with fork, you have to serialize results to bytes, through a pipe, and de-serialize. However, usually in my limited experience, result data is small compared to input data, so Linux fork() probably still wins.
    • Enabling Perl ithreads in a build of Perl makes the whole interpreter run slower, even when threads aren't used. (for technical reasons) Performance-focused Linux users prefer ithreads to be disabled for the speed boost.
    • On Windows, fork() *is* an ithread, because Windows doesn't have fork. In fact, this is the only reason ithreads were added to perl, because they already existed to support fake Windows forking.
    • Windows fork() has bad side-effects that you would not expect if you were familiar with fork from Linux. For instance, file handles are shared between parent and child. If the forked child closes a file handle, the parent loses it too.

    Summing it up,

    1. If you are on Windows and your program will only ever run on Windows, you might as well use ithreads because they are simpler than fork() and give the same result.
    2. If you are on Linux, compile your perl without ithreads so that it runs faster, and use perl modules to make forking/collecting data easier.
    3. If you want your program to run in multiple environments, use the forking perl modules, because not all perls have ithreads enabled, and the special modules usually do something more efficient than "clone everything" when starting a new worker.

      That's a nice overview (++). There are a couple of statements presented as facts which on inspection seem not to be.

      However, usually result data is small compared to input data

      I would certainly agree with "sometimes", but "usually" without any citation seems just to be an opinion. Perhaps the problem space in which you most work has such a feature but it would be surprising to find it to be universally (or even broadly) true.

      Linux users prefer ithreads to be disabled for the speed boost.

      Linux users who care about the speed boost at the expense of flexibility prefer ithreads to be disabled for the speed boost. The rest of us don't.

      I'm a Linux user and am quite happy to use threads. The interface is pretty slick and for some scenarios, threads are a perfect fit. In others, forked processes are more appropriate and in those scenarios I'm happy enough to use fork instead. Horses for courses.


      🦛

Re^4: Parallel processing on Windows (threads discouraged shit)
by Anonymous Monk on Sep 20, 2022 at 21:17 UTC
    Ignore that shit. It literally came about because unix users on got tired of not helping with threading questions. People who never tried threads agreed a scary warnings https://www.nntp.perl.org/group/perl.perl5.porters/2014/03/msg213382.html Its self-admitted FEARmongering from irc burnouts -- we're tired of trying to discourage folks from using threads and how to use threads on the irc .... lets scare them in the documentation .... it doesn't belong Well see Re^2: Splitting large array for threads. and follow deep, basically using threads has caveats so we'll misuse the "discouraged" label ... dumb
    Subject: PATCH add discouragement warning to perl threads documentation
    
    The common reactions to someone asking for help with threads even in #p5p
    being: "You're doing it wrong!" or "You have brain damage!" This commit
    attempts to reduce the number of such incidences by putting a huge warning
    on the threads documentation that should discourage all but the most
    determined.