Re: Parallel running the process
by roboticus (Chancellor) on Mar 06, 2012 at 19:01 UTC
|
ariesb2b:
One simple way is to:
- Split the file listing the databases into chunks
- Run the script multiple times, in parallel
- Concatenate the output files into your report
...roboticus
When your only tool is a hammer, all problems look like your thumb.
| [reply] |
|
|
thanks roboticus. yes definitely this is a way to do it.
But This involves lot of manual work to be done.
I have to run the script monthly and will be putting it inside cron job.
Is there any way that i can do this in a script itself.
Like the process running in background and it writes the output in the desired file as and when it completes.
| [reply] |
|
|
#!/usr/bin/perl
use strict;
use warnings;
use autodie;
my $num_jobs=4;
# Split the work up
open my $IFH, '<', 'data.inp';
my @OFH;
open $OFH[$_-1], '>', "data.$_" for 1 .. $num_jobs;
my $cnt=0;
while (<$IFH>) {
++$cnt;
my $FH = $OFH[$cnt%$num_jobs];
print $FH, $_;
}
close $OFH[$_-1] for 1 .. $num_jobs;
# Do the work
for my $j (1 .. $num_jobs) {
`perl orig_do_job --infile=data.$_ --outfile=data.out.$_ & `;
}
# Collect the results
`cat data.out.* >data.out`;
...roboticus
When your only tool is a hammer, all problems look like your thumb. | [reply] [d/l] |
|
|
| [reply] [d/l] [select] |
|
|
Re: Parallel running the process
by BrowserUk (Patriarch) on Mar 06, 2012 at 19:45 UTC
|
| [reply] |
Re: Parallel running the process
by i5513 (Pilgrim) on Mar 06, 2012 at 23:40 UTC
|
Hi,
I recommend you using pdsh
In your case should be:
- modify your script to receive only database name as parameter (and to make only the work needed to that database), databases file contains line by line all the databases
- $ pdsh -w^databases -R exec perl-script %h | tee outputs
- then play with outputs (see dshbak), probably with an script which recollect the info
I hope that help
| [reply] [d/l] |
|
|
| [reply] |
Re: Parallel running the process
by locked_user sundialsvc4 (Abbot) on Mar 06, 2012 at 21:33 UTC
|
Could you also please give us some idea of how your query or queries are constructed, and a brief idea of what indexes (if any) might be in play? My intuitive sense is that “2 to 3 hour” run-times might be avoidable, and if that be the case it would make all the difference.
Otherwise, I think, the operative question would be how well your database servers could handle whatever-it-is that you are doing. Particularly if your queries are resource intensive (and let us for the moment presume that they unfortunately must be), the question most-likely is going to be determined by just how many such queries your hardware and software is able to handle, not specifically “which database” is the specified target. (“This or that database” might merely boil down to a choice between directories... functionally irrelevant.)
| |
|
|
| [reply] |
Re: Parallel running the process
by marioroy (Prior) on Nov 25, 2012 at 06:59 UTC
|
MCE is a new Perl module recently added to CPAN. This is how one may use MCE to process in Parallel. MCE is both a chunking and parallel engine. In this case, chunk_size is set to 1. That option is not needed as calling the foreach method will set it to 1 anyway.
The sendto method can be used to serialize data from workers to a file. MCE also provides a do method as well to pass data to a callback function which runs from the main process.
$chunk_ref is a reference to an array. MCE provides both foreach and forchunk methods. In this case, the array contains only 1 entry due to chunk_size being set to 1.
The main page at http://code.google.com/p/many-core-engine-perl/ contains three images. The 2nd one shows the bank queuing model used in MCE with chunking applied to it.
use MCE;
## Parse command line argument for $database_list
my $mce = MCE->new(
max_workers => 4,
chunk_size => 1
);
$mce->foreach("$database_list", sub {
my ($self, $chunk_ref, $chunk_id) = @_;
my $database = $chunk_ref->[0];
my @result = ();
## Query the database
$self->sendto('file:/path/to/result.out', @result);
});
| [reply] [d/l] |