in reply to Parallel processing on Windows
The interesting thing about fork is that it works at all.
If you are not really bound on compatibility between a program that already uses fork(), I would rather look at threads and then communicate between them via Thread::Queue. This is a far better approach than trying to make form emulation work under Windows, at least as long as you initialize resources from within each thread instead of trying to share them between threads.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Parallel processing on Windows
by BernieC (Pilgrim) on Sep 20, 2022 at 19:13 UTC | |
Just to experiment, I tried fork()/wait() because I'm familiar with that. I wrote a little test program to see if the forked children were running and I got: but I haven't a clue about the negative process IDs {and, indeed, I looked at task manager and there wasn't either process in the list}, so I couldn't tell if they were really independent processes and would run on different CPU cores. Back in my Unix days I wrote a complete TCP server in Perl! worked like a champ. Sucks that Windows doesn't have a fork/kill/wait process structure} so it looks like i have a lot of learning to do to get something like that to work on Win10. Thanks! | [reply] [d/l] |
by Marshall (Canon) on Sep 20, 2022 at 23:10 UTC | |
However, it sounds like just using threads is the best way for your compute bound project. The less data you share between threads (ideally nothing that is r/w), the better. Threads can get complicated if there is a lot of sharing going on. This is a code fragment from some code years ago... The program has about 70,000 input strings. For each input string, it is desired to know which of the other input strings are "close enough" according to some complex rules. For each input, a regex is generated that is run against all other inputs. This is a NxN algorithm. For 70K inputs it took ~1.5 hours. I have a 4 core machine. Running 4 threads, execution time was something like 3.8x (can't get to exactly 4.0, but that is a very good result). So anyway execution time went to ~20 minutes and that was "good enough" and I stopping improving things.
Anyway see below for an example of parallelizing a number cruncher job.
| [reply] [d/l] |
by eyepopslikeamosquito (Archbishop) on Sep 22, 2022 at 08:16 UTC | |
Back in my Unix days I wrote a complete TCP server in Perl! worked like a champ. Sucks that Windows doesn't have a fork/kill/wait ... Note that you can write network servers in Perl, that work fine on both Unix and Windows, without forking and without threads, simply by taking an event-driven approach via IO::Select. Here's a complete working example of one I used for testing Syslog a while back: Test Syslog Server | [reply] |
by BernieC (Pilgrim) on Sep 20, 2022 at 19:27 UTC | |
The "interpreter-based threads" provided by Perl are not the fast, lightweight system for multitasking that one might expect or hope for. Threads are implemented in a way that make them easy to misuse. Few people know how to use them correctly or will be able to provide help. The use of interpreter-based threads in perl is officially discouraged.But the doc doesn't say what you should do about the discouragement. Should I give it a try , or is there yet something else/newer for this? or is this just kinda impossible in Perl on Windows... | [reply] |
by choroba (Cardinal) on Sep 20, 2022 at 19:46 UTC | |
map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
| [reply] [d/l] |
by bliako (Abbot) on Sep 20, 2022 at 20:26 UTC | |
GNU parallel has never failed me (under Unix, of course). It is a Perl script using threads and Thread::Queue. Reading (diagonally) the long discussion cited by choroba as to why the word "discouraged" was used, I did not find real arguments except perhaps that Threads:Shared at some time could not handle the cloning of deep, complex data structures to be shared (as I understand it, now it works) and also that you may not be able to find help. For me Corion's Re: Parallel processing on Windows suggestion served me well for all my parallel needs. I have used How to create thread pool of ithreads (the posts by BrowserUK in there) as my starting point. There is also marioroy's MCE which I have never used. It looks solid. See Reusable threads demo on how it is used as an alternative to the threads + Thread::Queue paradigm. bw bliako Edit: Another point in the long discussion mentioned above is performance of a thread-enabled perl and also the overheads of creating a new thread. The latter is mostly irrelevant when you follow the model of a pool of workers (the threads' queue) where a number of threads (workers) are created once and then keep processing your data queue. If you don't keep re-creating threads then this point is irrelevant mostly. Then you have the performance of a perl compiled to enable threads which can be really hindered by the various locks put in place to protect you against race conditions etc. in a potentially threaded environment. That penalty is irrespective of whether you use threads or not, it is whether you want Perl to be able to run threads. | [reply] |
by Anonymous Monk on Sep 20, 2022 at 21:18 UTC | |
by NERDVANA (Priest) on Sep 21, 2022 at 08:19 UTC | |
I don't really like the other explanations here, so let me try too: Most people think of "threads" as additional execution points running around in the same code and same data as eachother. Perl does not offer that option. And, actually I'm glad it doesn't, because in the Java and C++ I've written that does true "threading" it is extremely easy to introduce bugs when touching the same data structures. Getting "threading" right is massively complicated and requires rigorous design principles and IMHO has no place in a quick-and-easy scripting language. What Perl does offer as "ithreads" is a lot more like fork/wait. When you start an ithread, it clones the current perl program (but within the existing address space, creating a new parallel interpreter for the clone), executes in parallel, and then passes data back to the main program. You can do the same thing by creating a pipe, forking, running things, and writing the result through a pipe to the parent. ithreads make this convenient; but there are also perl modules that make fork/serialize/wait convenient. So, what are the decision points for choosing ithreads vs. fork/wait? Summing it up, | [reply] |
by hippo (Archbishop) on Sep 21, 2022 at 09:19 UTC | |
by Anonymous Monk on Sep 20, 2022 at 21:17 UTC | |
Subject: PATCH add discouragement warning to perl threads documentation The common reactions to someone asking for help with threads even in #p5p being: "You're doing it wrong!" or "You have brain damage!" This commit attempts to reduce the number of such incidences by putting a huge warning on the threads documentation that should discourage all but the most determined. | [reply] |
|
Re^2: Parallel processing on Windows
by BernieC (Pilgrim) on Sep 20, 2022 at 23:26 UTC | |
| [reply] |
by eyepopslikeamosquito (Archbishop) on Sep 21, 2022 at 05:21 UTC | |
| [reply] | |