Parallel running the process

ariesb2b has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Parallel running the process by roboticus (Chancellor) on Mar 06, 2012 at 19:01 UTC
ariesb2b: One simple way is to: Split the file listing the databases into chunks Run the script multiple times, in parallel Concatenate the output files into your report ...roboticus When your only tool is a hammer, all problems look like your thumb.	[reply]
Re^2: Parallel running the process by ariesb2b (Initiate) on Mar 06, 2012 at 19:21 UTC
thanks roboticus. yes definitely this is a way to do it. But This involves lot of manual work to be done. I have to run the script monthly and will be putting it inside cron job. Is there any way that i can do this in a script itself. Like the process running in background and it writes the output in the desired file as and when it completes.	[reply]
Re^3: Parallel running the process by roboticus (Chancellor) on Mar 06, 2012 at 20:04 UTC
ariesb2b: It doesn't have to be hard. I was thinking that if that approach worked for you, that you could write a simple script to do the overhead for you. Something like (untested): #!/usr/bin/perl use strict; use warnings; use autodie; my $num_jobs=4; # Split the work up open my $IFH, '<', 'data.inp'; my @OFH; open $OFH[$_-1], '>', "data.$_" for 1 .. $num_jobs; my $cnt=0; while (<$IFH>) { ++$cnt; my $FH = $OFH[$cnt%$num_jobs]; print $FH, $_; } close $OFH[$_-1] for 1 .. $num_jobs; # Do the work for my $j (1 .. $num_jobs) { `perl orig_do_job --infile=data.$_ --outfile=data.out.$_ & `; } # Collect the results `cat data.out.* >data.out`; [download] ...roboticus When your only tool is a hammer, all problems look like your thumb.	[reply] [d/l]
Re^3: Parallel running the process by JavaFan (Canon) on Mar 06, 2012 at 19:48 UTC
Sure. You may want to read about `fork`. Or threads, if you feel more comfortable. Or the classic solution: a select loop. But I assume you're using the DBI to give you database access, and I don't think the DBI has support to give back control to the program while waiting for the results. I would probably use `fork`, but that's because I understand the concept well, and a simple solution seems to be good enough to solve your problem.	[reply] [d/l] [select]
Re^4: Parallel running the process by Anonymous Monk on Mar 06, 2012 at 23:36 UTC
Re: Parallel running the process by BrowserUk (Patriarch) on Mar 06, 2012 at 19:45 UTC
provide a file which contains a list of databases as an input argument (perl_script -i <filename>) Are the databases all on the same server or different servers? With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. The start of some sanity?	[reply]
Re: Parallel running the process by i5513 (Pilgrim) on Mar 06, 2012 at 23:40 UTC
Hi, I recommend you using pdsh In your case should be: modify your script to receive only database name as parameter (and to make only the work needed to that database), databases file contains line by line all the databases `$ pdsh -w^databases -R exec perl-script %h \| tee outputs` then play with outputs (see dshbak), probably with an script which recollect the info I hope that help	[reply] [d/l]
Re^2: Parallel running the process by i5513 (Pilgrim) on Mar 07, 2012 at 21:54 UTC
Maybe in perlmonks, using Parallel::ForkManager is a better reply than pdsh one !, I have to test it !	[reply]
Re: Parallel running the process by locked_user sundialsvc4 (Abbot) on Mar 06, 2012 at 21:33 UTC
Could you also please give us some idea of how your query or queries are constructed, and a brief idea of what indexes (if any) might be in play? My intuitive sense is that “2 to 3 hour” run-times might be avoidable, and if that be the case it would make all the difference. Otherwise, I think, the operative question would be how well your database servers could handle whatever-it-is that you are doing. Particularly if your queries are resource intensive (and let us for the moment presume that they unfortunately must be), the question most-likely is going to be determined by just how many such queries your hardware and software is able to handle, not specifically “which database” is the specified target. (“This or that database” might merely boil down to a choice between directories... functionally irrelevant.)
Re^2: Parallel running the process by JavaFan (Canon) on Mar 07, 2012 at 10:16 UTC
It seems to me that he's polling several databases sequentially. Assuming the servers aren't all sharing the same CPU, disks, and network cables doing this in parallel seems an obvious and worthwhile win. Regardless whether the query is optimal or not.	[reply]
Re: Parallel running the process by marioroy (Prior) on Nov 25, 2012 at 06:59 UTC
MCE is a new Perl module recently added to CPAN. This is how one may use MCE to process in Parallel. MCE is both a chunking and parallel engine. In this case, chunk_size is set to 1. That option is not needed as calling the foreach method will set it to 1 anyway. The sendto method can be used to serialize data from workers to a file. MCE also provides a do method as well to pass data to a callback function which runs from the main process. $chunk_ref is a reference to an array. MCE provides both foreach and forchunk methods. In this case, the array contains only 1 entry due to chunk_size being set to 1. The main page at http://code.google.com/p/many-core-engine-perl/ contains three images. The 2nd one shows the bank queuing model used in MCE with chunking applied to it. `use MCE; ## Parse command line argument for $database_list my $mce = MCE->new( max_workers => 4, chunk_size => 1 ); $mce->foreach("$database_list", sub { my ($self, $chunk_ref, $chunk_id) = @_; my $database = $chunk_ref->[0]; my @result = (); ## Query the database $self->sendto('file:/path/to/result.out', @result); });` [download]	[reply] [d/l]