bedanta has asked for the wisdom of the Perl Monks concerning the following question:
Hi,
I have a PERL script that reads two files and performs some operation on them, this code returns in about 35 min. in a server with 1 processor as well as a server with multiple processors.
How can I make the PERL script take advantage of the multiple processors?
Plz help...
Regards,
Bedanta
Re: Taking advantage of multi-processor architecture
by dragonchild (Archbishop) on Feb 11, 2005 at 14:19 UTC
|
| [reply] |
|
I realize that the OP's question undoubtably deals with SMP, but I've made it a mission to point out that TIMTOWTDI when it comes to multiple processor machines.
<pedantic>
SMP = Symmetric Multi Processing - a certain way of using more than one processor at a time, usually reserved to systems of eight or fewer processors. Compare NUMA.
NUMA = Non-Uniform Memory Access - a type of multi-processor arrangement in which memory access times differ depending upon which processor is using which part of memory, necessitating that the OS keep track of which parts of memory it uses or assigns to applications. Compare SMP, see The Linux Scalability Effort for some examples.
Update (for sake of a more complete list):
There's also Assymmetric Multi Processing, in which certain code must be run on certain processors instead of any process being assigned to any processor. NUMA is generally closer to SMP than AMP.
Some definitions of SMP only address its differences from AMP, saying that SMP means that any process can run on any processor. This means that most NUMA machines are SMP with additional scheduling concerns.
As I've heard SMP defined, it is usually clarified that not only can any processor handle any process, but that the machine also allows each processor the same access to all of the system's main memory. This definition makes SMP and NUMA distinct.
</pedantic>
| [reply] |
Re: Taking advantage of multi-processor architecture
by inman (Curate) on Feb 11, 2005 at 14:10 UTC
|
Perl will run the script in a single thread of execution. In order to take advantage of multiple processors, you need to re architect your app and either use threads or multiple processes. Please note that thread / process allocation is the responsibility of the OS. The technique merly gives the OS the ability to assign the thread / process to different processors. On some systems, you will be able to assign CPU affinity for a process. | [reply] |
Re: Taking advantage of multi-processor architecture
by hardburn (Abbot) on Feb 11, 2005 at 14:10 UTC
|
Taking advantage of conncurrency ranges from highly trvial up to Halting-Problem difficult. It all depends on what your goals are.
You'll need to take a look at what your code does to the data. Lets say you've got a big loop that processes all the data. Does that loop's next iteration depend on the last one completeing? If not, you can split the data in half, then fork() the processes, giving each process half the data. Then get the data back together.
If this loop does depend on the last iteration completeing, your task may be difficult or impossible.
"There is no shame in being self-taught, only in not trying to learn in the first place." -- Atrus, Myst: The Book of D'ni.
| [reply] |
|
Actually depending on how IO intensive the work is you may find that it's advantageous to split into more processes (or threads) than you have physical processors. If one of the workers becomes blocked waiting on IO the OS will be able to schedule another which can get useful work done while the first one's waiting for its data.
One application of this trick is if you're using GNU make to tell it to run n+1 jobs on an n processor box (e.g. make -j 3 on a dual CPU machine). While compiles are usually CPU bound, there's usually enough IO slack that it'll finish a little faster than if you'd just run -j 2.
| [reply] |
Re: Taking advantage of multi-processor architecture
by neilwatson (Priest) on Feb 11, 2005 at 14:19 UTC
|
| [reply] |
|
| [reply] |
Re: Taking advantage of multi-processor architecture
by BrowserUk (Patriarch) on Feb 11, 2005 at 14:29 UTC
|
Not enough information.
How big are the files?
What operations is it performing on them?
Depending upon what the script is doing, you may be able to cut the processing time in half on a dual processor machine, or it might not make any difference at all.
If you post the code, or a working program that does the same basic steps as your real code, you would stand some chance of getting meaningful advice.
Examine what is said, not who speaks.
Silence betokens consent.
Love the truth but pardon error.
| [reply] |
Re: Taking advantage of multi-processor architecture
by bluto (Curate) on Feb 11, 2005 at 18:34 UTC
|
Since you've only posted a general question, we can only give you general advise. In order for you to optimize a program, you must know two things. What resource is it limited by (usually CPU speed or poor code design; real memory; disk speed)? How can you restructure your code to either limit it's dependence on that resource or parallelize access to it. Others have mentioned forking/threading, but these can be hard to use if you inexperienced with them and don't have the time to learn. Some other things you may want to consider...
If your script is performing a lot of IO (reading and writing), consider separating the files onto different physical disks, and importantly don't access files through things like NFS mounts. Sometimes this alone can double the throughput, esp if you are reading and writing two files at the same time on the same physical disk.
You really do not want the system to be swapping while your program is running, since it will cause things to slow down a lot. This is often caused by trying to manipulate massively large data structures in memory in perl. If your script is using lots of memory (e.g. reading two large files completely into memory before processing), consider processing as you read each line in. If you need them in arrays, consider using something like Tie::File. One common example is trying to sort a massive array within perl itself. Sometimes you can call an external utility to do this for you much more quickly (e.g. GNU sort).
If you give more details, I'm sure someone can help out more. | [reply] |
A reply falls below the community's threshold of quality. You may see it by logging in. |
|
|