Re: How do I run subroutines in parallel?
by Zaxo (Archbishop) on Sep 09, 2002 at 07:42 UTC
|
Unless you are running on a multiprocessor box or a cluster, 'parallel' really just means multiprocessing. If you fork coprocesses, they can get their own time slice whenever they are ready to run, so your job effectively gets more time from the system.
Parallel::ForkManager is very convenient for that sort of thing. The GPL and Artistic licenses both encourage redistribution under sane terms.
(Added): Threads are another possibility, but they do not gain you time slices on *nix. They're mainly useful if the parallelized routines are I/O bound.
Perl 5.003 is terribly old. There were no threads then. On the brighter side, fork is fine for SMP and other multiple CPU systems, and will do true parallel processing for the kids.
After Compline, Zaxo
| [reply] |
|
|
Unless you are running on a multiprocessor box or a cluster, 'parallel' really just means multiprocessing. If you fork coprocesses, they can get their own time slice whenever they are ready to run, so your job effectively gets more time from the system.
Good point. My first reaction upon reading the question was that forking and threading can improve your performance ONLY when you've got tasks that have a lot of 'waiting time', such as retrieving a bunch of files over a network. Your point about there being more timeslices for the code when spread over more processes proved me wrong before making an ass of myself in public :-)
I do wonder if this makes a lot of difference when you don't have a lot of other processes running though, and wheter you cannot get the same sort of gain when you up (or lower, however you look at it) the priority on the process. Does anyone have any pointers to benchmarks/articles on this subject?
--
Joost downtime n. The period during which a system
is error-free and immune from user input.
| [reply] [d/l] |
|
|
Yes we are running this on a multiprocessor box. I understood that threads were unstable in perl. Then again we are using perl 5.003.
I need to look into fork tomorrow. This might be what I need.
How do I call the subroutine? Do I need to split the subroutine into it's own perl file and call: fork(perl subroutine.pl)
Also, I need to save the stdout/stderr. If I run things in parallel I think the output will be "mixed up". Correct?
Sorry for such dumb questions...I'm not fluent in perl. :) I've rarely ever used modules... and I've never used threads in perl.
And I can't find code examples to do this.
Thanks for all the help!
| [reply] |
|
|
If I run things in parallel I think the output will be "mixed up". Correct?Yes and no. You may redirect the output into different files.
You may also use file locking to make sure that the messages are written to a file in a certain order.
There are other issues that you have not considered. Once you fork, any contact between parent and child will need to use some IPC (InterProcess Communication) method in order to give data to a child, and to retrieve it. These include TCP Sockets, UNIX Sockets, shared memory & semaphores, RDBMS and finally simple files.
I wish you good luck in your fight against the daemons of deadlock and concurrency.
| [reply] |
Re: How do I run subroutines in parallel?
by atcroft (Abbot) on Sep 09, 2002 at 07:00 UTC
|
Would either Parallel::ForkManager or Schedule::Parallel be of assistance? While I lack in experience with the latter, my experience with the former has (so far) been that the jobs (code) are loaded into the queue, and N items are run, with the next job in the queue being run when another completes. Because they are child processes, they inherit the open filehandles, iirc.
| [reply] |
Re: How do I run subroutines in parallel?
by talexb (Chancellor) on Sep 09, 2002 at 13:06 UTC
|
| [reply] |
Re: How do I run subroutines in parallel?
by zentara (Cardinal) on Sep 09, 2002 at 16:11 UTC
|
Maybe this node by abigail-II would interest you.
forking code
It is an easy to use forking method. Now you really have to look closely at them, they allow you to pass code references to the
children. I don't see why you can't pass them different sub-routines
to execute.
Here is a sample, I hope this helps:
#!/usr/bin/perl
#For afork, the first argument is an array - a child will be
#forked for each array element. The second argument indicates the maxi
+mum
#number of children that may be alive at one time. The third argument
+is a
#code reference; this is the code that will be executed by the child.
+One
#argument will be given to this code fragment. For afork, the array el
+ement is
#passed. Note that this code will assume no other children will be spa
+wned,
#and that $SIG {CHLD} hasn't been set to IGNORE.
#by Abigail
if ($#ARGV < 0){@ARGV = qw( 1 2 3 4 5)}
afork (\@ARGV,10,\&hello);
print "Main says: All done now\n";
sub hello{
my $data = $_[0];
print "hello world from $data\n";}
##################################################
sub afork (\@$&) {
my ($data, $max, $code) = @_;
my $c = 0;
foreach my $data (@$data) {
wait unless ++ $c <= $max;
die "Fork failed: $!\n" unless defined (my $pid = fork);
exit $code -> ($data) unless $pid;
}
1 until -1 == wait;
}
#####################################################
| [reply] [d/l] |
Re: How do I run subroutines in parallel?
by gmpassos (Priest) on Sep 09, 2002 at 23:19 UTC
|
What you want is threads. Here are an example of how to do this. Note, this script is only for Perl-5.8.0, don't use other version. I recommend that you compile Perl-5.8.0 without PerlIO, because if you are using sockets with PerlIO you will get a lot of bugs! PerlIO is very good, but now it has some bugs to solve (see the perlbug list), we need to wait for Perl-5.8.1 or get the last pathches.
This example show how to use the share resource of threads. If you want a variable that can be shared by the other threads, just declare it like the example, with : share in the end, don't forget to load the module threads::shared!
Other thing, note that the main is another thread too! If it goes out (exit), the other threads will be closed too. To wait a thread use the command join: $thread->join
#!/usr/bin/perl
use threads;
use threads::shared;
$|=1;
my ($global) : shared ;
my $thr1 = threads->new(\&TEST,1) ;
my $thr2 = threads->new(\&TEST,2) ;
my @ReturnData = $thr1->join ;
print "Thread 1 returned: @ReturnData\n" ;
my @ReturnData = $thr2->join ;
print "Thread 2 returned: @ReturnData\n" ;
########
# TEST #
########
sub TEST {
my ( $id ) = @_ ;
for(0..10) {
$global++ ;
print "id: $id >> $_ >> GLB: $global\n" ;
sleep(1) ;
}
return( $id ) ;
}
Graciliano M. P.
"The creativity is the expression of the liberty". | [reply] [d/l] |
Re: How do I run subroutines in parallel?
by plauterb (Acolyte) on Sep 10, 2002 at 01:17 UTC
|
| [reply] |
Re: How do I run subroutines in parallel?
by Brutha (Friar) on Sep 10, 2002 at 08:05 UTC
|
I have not read a comment about Windows and perl. So here my comments.
I tried the thread thing which does not work. (ActivePerl 5.6-build 62x) Now I use Win32::ProcFarm system for parallelization of code under Win32 by Toby Everett under Artistic licence. This let me start child-processes, which are Perl scripts and and do remote procedure calls over sockets, giving complete Perl data structures as parameters. It is not quit clean yet, but it works nice for me.
Depending on your tasks, parallization is not always a matter of CPUs. My App is a (preforking) server with a fixed number of childs (for now). It accepts mainframe socket connections. My childs are a downloader, which initiates source download from the mainframe via FTP and a preparer, which compiles the downloaded objects. Downloading is a task, which really allows parallel processing of other tasks. Another parallel thing feeds a GUI with status information.
Do not underestimate the coordination of parallel processes, if they do not have isolated tasks.
I do Logging with Log::Agent, this works great for me and I think it is a great module.
Ask me for more information
Update:
As Massyn answered to this post, it makes a great difference on which OS you run parent and child processes. I could not get communication via pipes, files or exit codes to work on Windows. My App depend on a windows product, so I did use a windoze only solution. The anonymous monk did not tell about his plattform. Brutha | [reply] |
|
|
From what I've seen and heard, running this type of thing on Unix is a lot easier than on a Windows platform, then again, you can't always change your code from one OS to another.
Although fork is one way of going, running scripts with the & operator on Unix allows a script to run in the background. You may try to use your backticks but to add the & ad the end to put the script into background mode. If the backticks don't work, try the system() function.
I don't do this a lot, so I may be wrong. I would like to know if you managed to succeed in this.
Remember that some Perl function like fork are reliant on the operating system, and they do function different on different platforms.
| [reply] |
Re: How do I run subroutines in parallel?
by Oicu812 (Initiate) on Sep 10, 2002 at 12:45 UTC
|
Try the fork(); command. The command copies your entire process, and runs them simultaneously. A non-zero value is returned to the child process, and a zero to the parent, so you can tell which is what...
if (!($PID = fork()))
{
SOME_SUB();
}
else
{
SOME_OTHER_SUB();
}
The child will run until it hits an exit or die....
O | [reply] [d/l] |
what is a monk
by Anonymous Monk on Sep 10, 2002 at 02:11 UTC
|
hey
i was just wondering what a monk is and what you do at your meetings for one of my school assignments
so if you can help me i would greatly appreciate it
thank- you | [reply] |
|
|
| [reply] |