Re^2: threading a perl script

Replies are listed 'Best First'.
Re^3: threading a perl script by BrowserUk (Patriarch) on Apr 23, 2011 at 13:18 UTC
If that is all you took from my post, then you shouldn't be celebrating. If all the programs you write do nothing more complicated than `wc` or `fgrep`; if in your world a 1GB file represents nothing more important than say 1 day's twaterrings, and you only need to count the number of '!'s used; if between reading and writing your important data you need to do nothing of any consequence; then stick with your un-threaded perl, because it can do nothing very, very quickly. On the other hand, if you are (say) a bio-geneticist. And that 1GB of data represents 10,000 x 100k base sequences, each of which need to be fuzzy matched at every offset, against each of 25,000 x 25-base sub-sequences; a process that on a single core takes a week or more, then you'll be rather more interested in using perl's threading to reduce that to under a day using a commodity 8-core box. And even more grateful for that threading when you're given access to the departmental 128-core processor, and your script completes in under 2 hours with nothing more than a change of command line argument. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]
Re^4: threading a perl script by vkon (Curate) on Apr 24, 2011 at 07:45 UTC
my everyday programming indeed mostly text processing - just enough for single-threaded perl. This is not simply 'wc' or grep like you said, but some text transforms, with either XML modules, or somehting like that good enough for single-threaded perl Actually I do not agree with your point, that I will have much benefit from threaded perl on multicore for complicated tasks. The great drawback of perl threads - they are heavy - and also unstable - will make me find other solution. Increasing CPU power is not the way to go. Most probably I will use number-crunching libraries from perl (lapack or maybe specialized library). Perl has owervelming strength in text processing, GUI, etc, its compact and nice, plus we have CPAN - that's brilliant. But using it for calculation-intensive task is just wrong. and then - when program speed is not enough - increasing number of CPUs is even worse :) addition: we have a fresh fork-related bug for windows perl today - http://rt.perl.org/rt3/Ticket/Display.html?id=89142 - at least Active Perl 5.12.1 and 5.10.0 do crash on that simple program. threads are just not stable enough. PS do 128-core processor exist? :o :o	[reply]
Re^5: threading a perl script by BrowserUk (Patriarch) on Apr 24, 2011 at 10:20 UTC
PS do 128-core processor exist? Oh yes. And even more also if you have the cash. But next year they'll be cheaper. And the year after that cheaper still. See IBM Power 750. 4 SMP processors each with 8-cores; each core with 4-hardware threads, giving 128 concurrent threads of execution. Or its big brother, the Power 795 32 SMP processors each with 8-cores, each core with 4 hardware threads apiece for 1024 concurrent threads. Or HP Superdome II with 256 cores. Or Sparc M9000 with 256 cores. Actually I do not agree with your point, that I will have much benefit from threaded perl on multicore for complicated tasks. Increasing CPU power is not the way to go. Sorry, but you are wrong. Due to the physical limits of the silicone, the chip-fabs cannot increase the clock speed any higher than they currently are without risking thermal runaway, so increasing the number of cores is the only way to go. And as you can see from the above, the hardware guys are already going that way. Most probably I will use number-crunching libraries from perl (lapack or maybe specialized library). Perl has owervelming strength in text processing, GUI, etc, its compact and nice, plus we have CPAN - that's brilliant. But using it for calculation-intensive task is just wrong. Math libraries don't help where no math is involved. DNA work is all text processing. By you own words, one of perl's strengths. The speed of comparing two strings is entirely limited by the speed of the processor. And clocks speeds are not increasing. The only way to speed up text processing is to compare more than two strings at once. addition: we have a fresh fork-related bug for windows perl today Who cares. Don't use fork on windows. Perl has owervelming strength in text processing, GUI, etc, its compact and nice, plus we have CPAN - that's brilliant. But using it for calculation-intensive task is just wrong. and then - when program speed is not enough - increasing number of CPUs is even worse :) The great drawback of perl threads - they are heavy - and also unstable The (so called) heaviness is irrelevant, 47MB (see below) is a mere drop in the ocean of my 4GB of ram. And for Ł40 I could double that. Heck. My browser is currently using 973MB as I type this. And you're wrong about the stability too. They've been very stable on Windows for several years now. There is (or was until recently), still a memory leak with spawning threads on nix, but if you do the sensible thing and only spawn 1 or 2 threads per core, that is entirely manageable. Mind you, if more people actually used threads on nix, instead of burying their heads in the sand as a defence against learning the inevitable "new thing", that would probably have been fixed long ago. This 60 line, standalone, single-threaded DNA fuzzy matching script runs in just under 10MB, uses 25% of my (Ł400) 4-core box and takes 6 minutes 54.5 seconds to fuzzy match 25,000 25-base motifs against each 100k base sequence. For that 1GB/10,000 sequence file I mentioned, that means a total elapsed time of 48 days or just under 7 weeks.: Read more... (2 kB) This, 75-line, standalone, multi-threaded DNA fuzzy matching script runs in 47MB, uses 100% of my (Ł400) 4-core box and also takes 7 minutes of CPU to fuzzy match 25,000 25-base motifs against each 100k base sequences. But, it processes 4 sequences at a time, so the total elapsed time for that 1GB/10,000 sequence file falls to just over 12 days: Read more... (2 kB) On that IBM Power 750, with the addition of a simple command line switch: `-NTHREADS=128` and you can expect that to drop to just 9 hours! On the 795 with `-NTHREADS=1024` 1 hour and 10 minutes. You aren't going to see those sort of gains from using a math library; nor from re-coding the program in C; nor from finding a "better algorithm". And if you are doing any serious XML processing, those sort of gains are available to you also. Via threading. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]
Re^6: threading a perl script by vkon (Curate) on Apr 24, 2011 at 17:42 UTC
Re^7: threading a perl script by BrowserUk (Patriarch) on Apr 24, 2011 at 18:56 UTC
Some notes below your chosen depth have not been shown here
Re^5: threading a perl script by Anonymous Monk on Apr 24, 2011 at 08:03 UTC
you're conflating unrelated things, fork is not implemented using threads module The bug doesn't happen if you use -Mencoding	[reply]
Re^6: threading a perl script by vkon (Curate) on Apr 24, 2011 at 08:45 UTC
Re^7: threading a perl script by BrowserUk (Patriarch) on Apr 24, 2011 at 11:44 UTC
Some notes below your chosen depth have not been shown here