in reply to Perl and autoparallelization

Hyperthreading (assuming your talking about Intel's Hyperthreading technology) 'just happens' (or not). If the code in the program is condusive to being hyperthreaded it will be, otherwise it won't. You do not control it.

You might derive some extra benefit from hyperthreading if you compiled Perl using Intel's C compiler (which is huge and very expensive), as they may well have added optimisations to their compiler that will make the compiled code more condusive to hyperthreading, but the differences are likely to be small due to the 'data is code' nature of Perl (and other interpreters).

As Zaxo pointed out, most file crunching programs are IO-bound not CPU-bound, so multi-tasking them is often of little benefit. If your task is IO-bound, then you are better of buying a faster disk, or perhaps splitting your files across multiple (real, physical not virtual) disks.

In the rare event that your processing is CPU-intensive. Eg. Each file is small but requires a large amount of processing--some gene work might fit this category. Then, you probably could benefit from multi-tasking the overall load across several processors. If each read-process-write cyle is entirely independant of each other, then simply splitting the input files into one group per processor (eg. [a-f]*, [g-m]*, [n-t]*, [u-z]* for 4-processors) and starting one copy of the program to handle each group is probably as simple as it gets and reasonably effective.

If your task requires each processing cycle to have some knowledge of other processing cycles, then running one thread per processor may be easier.

As you can see, determining what if any benefit can be derived from multi-tasking a process, and how best to achieve it, requires fairly detailed knowledge of both the processing required and the system on which it is going to run.


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail

Replies are listed 'Best First'.
Re^2: Perl and autoparallelization
by Anonymous Monk on Jun 07, 2004 at 15:39 UTC
    Splitting the files is an attractive and simple solution provided the CPU requirements for processing each file are comparable or there are sufficient of them for the differences to average out. Suppose that assumption isn't true? How would you use Perl to manage a jobqueue which sent the next file for processing to the next available processor?

      I'd use one worker thread per processor and a Thread::Queue of the files to be processed. The main thread sets up (or feeds, if the list is very large eg. >~10,000) the Q with the files to be processed.

      The threads take the first file off the Q, process it and then loop back and get the next until the Q is empty.

      This is extremely simple to code and since 5.8.3 appears to be very stable as far as memory consumption is concerned, though I haven't run any really long runs using Thread::Queue.

      Once the threads are spawned, no new threads or processes need to to be created or destroyed which make it pretty efficient. All the sharing and locking required is taken care of by the tested and proven Thread::Queue module.

      I might try varying the number of threads up and down to see what gave the optimal throughput.


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
      Another choice might be POE. I haven't used it, but it seems like it would be a good choice for this kind of load-balancing thing, especially if there's a chance that it will later exceed the capacity of one machine.

      --
      Spring: Forces, Coiled Again!