latejita has asked for the wisdom of the Perl Monks concerning the following question:

Continuing my journey on being able to calculate things on certains algebras(see sort function and parity of the permutation.) . I noticed that when generating lots of permutations perl uses 100% of one of my 4 processors while the other 3 just sit around only playing classical music.

Is there a way I can use the power of all my processsors? If so, what's the best way to do it?

Thank you very much.

Replies are listed 'Best First'.
Re: Multi-kernel processors.
by jethro (Monsignor) on Feb 17, 2011 at 13:55 UTC

    Yes, there is. But there is never a best way in all circumstances. It depends on what operating system you are on, how easy it is to spearate your task into parallel subtasks (and your data in "subdata"), how dynamic you want your processor power to be distributed and how much comfort you want.

    But if I don't want to split hairs, I might answer: If you are on linux (or other unixoides, even Mac) you probably would use fork(), probably with the help of a module like Parallel::ForkManager. On Windows threads are commonly used for parallel tasks

Re: Multi-kernel processors.
by chrestomanci (Priest) on Feb 17, 2011 at 14:04 UTC

    I wrote about multi threaded sorting, in a thread about recursion.

    Consider the cannonical example of qsort in Haskell:

    quicksort :: (Ord a) => [a] -> [a] quicksort [] = [] quicksort (x:xs) = let smallerSorted = quicksort [a | a <- xs, a <= x] biggerSorted = quicksort [a | a <- xs, a > x] in smallerSorted ++ [x] ++ biggerSorted

    The way I read that code, the list get split into those elements larger than the pivot and those smaller than the pivot. Each list is recursively sorted, and the Haskell runtime is free to do those two sorts in parallel.

    This shows that you can use multiple threads to do sorting, however, as BrowserUk wrote in his insightfull reply, just because you can use recursive threads to do sorting, does not mean that it is a good idea.

    I guess if you have very big lists that you need to sort, and the time taken to sort on one processor is very long, then you could consider implementing the first level or two of a recursive qsort. Something like this (untested)

    sub multi_thread_sort { my @list = @_; my $pivot = shift @list; my @lowList = grep { $_ < $pivot } @list; my @highList = grep { $_ >= $pivot } @list; # Now start two threads to sort the two lists concurrently. return ( @lowList, $pivot, @highList ); }
Re: Multi-kernel processors.
by BrowserUk (Patriarch) on Feb 17, 2011 at 17:24 UTC
    when generating lots of permutations.... Is there a way I can use the power of all my processsors?

    It depends how you are generating your permutations?

    Assuming your permutations are (or can be) derived from a linear range of integers, then partitioning that range into sub-ranges and running each in a separate thread brings almost linear gains for cpu-bound processing.

    This is a contrived example, but could be representative of the type of processing you're doing based on what you've said:

    #! perl -slw use strict; use Data::Dump qw[ pp ]; use threads; use Time::HiRes qw[ time ]; sub thread { my $tid = threads->tid; my( $start, $end ) = @_; warn "$tid: processing $start to $end\n"; my $count = 0; for my $mon ( $start .. $end ) { $count += unpack '%32b*', pack 'N', $mon; } return $count; } our $T //= 1; our $M //= 1e6; my( $step, $s, @ranges ) = ( int $M / $T, 0 ); push( @ranges, [ $s, $s+$step -1] ), $s+=$step for 1 ..$T; $ranges[ -1][1] = $M; my $start = time; my @threads = map{ threads->create( \&thread, @$_ ); } @ranges; my $total = 0; $total += $_->join for @threads; printf "Total = %d (in %f seconds using $T threads)\n", $total, time() +-$start; __END__ C:\test>888714 -M=100e6 -T=1 1: processing 0 to 100e6 Total = 1314447116 (in 46.340000 seconds using 1 threads) C:\test>888714 -M=100e6 -T=2 1: processing 0 to 49999999 2: processing 50000000 to 100e6 Total = 1314447116 (in 23.269000 seconds using 2 threads) C:\test>888714 -M=100e6 -T=3 1: processing 0 to 33333332 2: processing 33333333 to 66666665 3: processing 66666666 to 100e6 Total = 1314447116 (in 15.878000 seconds using 3 threads) C:\test>888714 -M=100e6 -T=4 1: processing 0 to 24999999 2: processing 25000000 to 49999999 3: processing 50000000 to 74999999 4: processing 75000000 to 100e6 Total = 1314447116 (in 12.960000 seconds using 4 threads)

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Multi-kernel processors.
by anonymized user 468275 (Curate) on Feb 17, 2011 at 14:14 UTC
    Multiple cores only come into play for multiple processes. That's why Parallel::ForkManager is being suggested, to force process separation of parts of your code.

    One world, one people

Re: Multi-kernel processors.
by DrHyde (Prior) on Feb 18, 2011 at 10:50 UTC

    Sitting there playing classical music sounds like a fine use of spare cycles to me!

    But if you want to instead devote one CPU to less processor-intensive computer-generated doof-doof noises while having three of them running perl, you'll need to split your task into seperate processes or threads, which the OS can then schedule for the different processors.

Re: Multi-kernel processors.
by locked_user sundialsvc4 (Abbot) on Feb 17, 2011 at 18:58 UTC

    Even as you observe that only one CPU resource is being fully utilized, remember to consider all of the various resources that could be sources of contention.   Real memory (i.e. actual RAM chips) should be fully utilized if possible, but not to the extent that significant virtual-memory paging begins to occur.   (Don’t forget that a fair amount of RAM is, and should be, used for file buffers.)   The practical amount of disk-I/O throughput that your system is capable of, is always a fundamental constraint.   It is quite common that a system that is operating at-or-near its actual pragmatic capacity to do work is “barely using” its CPU resources, simply because they are so incredibly faster than everything else in the machine.

      The practical amount of disk-I/O throughput that your system is capable of, is always a fundamental constraint.

      You're talking bollocks again!

      If a process is using 100% cpu, it cannot be doing any I/O. If it is doing no I/O then I/O cannot be a constraint of any kind, let alone a fundamental one.

      Why do you continue to spout such shite?


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.