in reply to best strategy

One cannot help but think that at this point your best strategy would be to actually start writing some code.

In the time since you first asked this question, you could have written prototypes of both a forking and a threaded solution and now be in a position to do some empirical tests to determine which works best on your particular setup.

With a little care, the subroutines for issuing the queries, and comparing the results, should be reusable by both prototypes without change. You have already written code for the threading infrastructure. Knocking up a forking equivalent using Parallel::ForkManager should be relatively simple. Once you have both, you will be in a position to make some real progress on deciding which is going to work best in your environment as well as deciding if moving to a Perl solution is really going to produce any benefit over your exists C++ solution.

On the basis of the accumulation of the sparse information you've provided spread across your 3 threads on this subject, my gut feel is that a threaded solution will be most flexible and efficient, as your comparisons seem to be consuming the bulk of the time, using a reusable pool of workers will have less startup overhead and cause least memory thrashing. It will also require the least amount of infrastructural overhead to control the asynchronicity.

But, given your lack of information regarding the performance of the hardware setup--network bandwidth and latency--along with the spread of inherent hardware parallelism available--from 4 to 12 cpus--and the only measure of where the bottlenecks of the existing system lie, being hearsay that "the comparison is where most of the time is spent", attempting to draw any conclusions is only ever going to be speculation.

The only ways you are going to come up with any definitive answers is

  1. perform some deep analysis of the existing system and attempt to extrapolate that to your two alternative implementations;
  2. knock up some prototypes and perform some measurements.

And the latter approach will be quicker to do; require less in-depth knowledge; and provide the most accurate results.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Replies are listed 'Best First'.
Re^2: best strategy
by libvenus (Sexton) on Aug 25, 2008 at 12:32 UTC

    well i have done exactly that, though i m unable to use Parallel::Forkmanager as it is not available(in my env)

    Using threads

    use strict; use warnings; use Benchmark;# use threads; #open(PROD,"/ms/user/j/juyva/dev/files_xls_tmpl_cfg_nonscripts/prod. +txt") || die " $! "; #my @allProd = <PROD>; #close PROD; #open(TEST,"/ms/user/j/juyva/dev/files_xls_tmpl_cfg_nonscripts/test. +txt") || die " $! "; #my @allTest = <TEST>; #close TEST; my @allPort = qw(22600 22610); my %hashOp; sub boss { for(my $i = 0;$i < @allPort; $i++) { my $thr = threads->new(\&worker,$allPort[$i]); } foreach my $thr (threads->list) { # Don't join the main thread or ourselves if ($thr->tid && !threads::equal($thr, threads->self)) { $thr->join; } } } sub worker { my $port = shift; my $timeTakenDm = timeit(1,sub { system(" /ms/dist/pcs/bin/cli +ent hqsas501 $port 200 \-f /ms/user/j/juyva/dev/files_xls_tmpl_cfg_no +nscripts/sql1.clmod.NEW.txt > $port.txt " )}); print "Dm took:",timestr($timeTakenDm),"\n"; if ($? == -1) { print "failed to execute: $!\n"; } elsif ($? & 127) { printf "child died with signal %d, %s coredump\n", ($? & 127), ($? & 128) ? 'with' : 'without'; } else { printf "child exited with value %d\n", $? >> 8; } } my $obj = timeit(1, sub { my $thrboss = threads->new(\&boss); $thrboss->join; my (@allProd,@allTest); foreach my $port (@allPort) { open(HAN,"$port.txt") || die " $!"; my @temp = <HAN>; $hashOp{$port} = \@temp; #print $hashOp{$port}; close HAN; } my $timeSort = timeit(1, sub { @allProd = sort +@{$hashOp{"22600"}}; @allTest = sort +@{$hashOp{"22610"}}; }); print "sort took:",timestr($timeSort),"\n"; #my @allProd = sort @{$hashOp{"22600"}}; #my @allTest = sort @{$hashOp{"22610"}}; #print @allProd; #print @allTest; unless(@allProd == @allTest) { print " inside unequal rows retunred\n +"; my $whichhasmoreelements = @allProd > +@allTest ? 'allProd' : 'allTest'; if($whichhasmoreelements =~ /Prod/) { print " the no of lines do not ma +tch prod has more rows are they are \n"; my @tempallProd = @allProd; my @diffProdTest = splice(@tempa +llProd,(@allTest -1),(@allProd - @allTest)); print @diffProdTest; print " do u want to continue : e +nter y/n "; my $choice = <STDIN>; exit if($choice =~ /^n$/i); } else { print " the no of lines do not ma +tch test has more rows are they are \n"; my @tempallTest = @allTest; my @diffProdTest = splice(@tempal +lTest,(@allProd - 1),(@allTest - @allProd)); print @diffProdTest; print " do u want to continue : e +nter y/n "; my $choice = <STDIN>; print " $choice "; exit if($choice =~ /^n$/i); } } for( my $i = 0;$i < (@allProd > @allTest ? @ +allProd : @allTest); $i++) { unless($allProd[$i] eq $allTest[$i]) { my @defaultProd = split/\|/,$allP +rod[$i]; my @defaultTest = split/\|/,$allT +est[$i]; unless(@defaultProd == @defaultTe +st) { my $whichhasmoreelements = +@defaultProd > @defaultTest ? 'defaultProd' : 'defaultTest'; if($whichhasmoreelements =~ + /Prod/) { print " the no of l +ines do not match prod has more rows are they are \n"; my @tempallProd = @ +defaultProd; my @diffProdTest = + splice(@tempallProd,(@defaultTest -1),(@defaultProd - @defaultTest)) +; print @diffProdTest +; print " do u want t +o continue : enter y/n "; my $choice = <STDIN +>; exit if($choice =~ +/^n$/i); } else { print " the no of l +ines do not match test has more rows are they are \n"; my @tempallTest = @ +defaultTest; my @diffProdTest = +splice(@tempallTest,(@defaultProd - 1),(@defaultTest - @defaultProd)) +; print @diffProdTest +; print " do u want t +o continue : enter y/n "; my $choice = <STDIN +>; print " $choice "; exit if($choice =~ +/^n$/i); } } for( my $a = 0;$a < (@defaultProd + > @defaultTest ? @defaultProd : @defaultTest); $a++) { unless($defaultProd[$a] eq $ +defaultTest[$a]) { print " Column $a differ +s::"; print " PROD value $defa +ultProd[$a] : TEST value $defaultTest[$a] \n"; } } } } } ); print "code took:",timestr($obj),"\n";

    Using multiple processes

    use strict; use Benchmark ; use warnings; my @allPort = qw(22600 22610); my %hashOp; sub spawnChild { for (0..$#allPort) { my $cpid = fork(); die unless defined $cpid; if (! $cpid) { # This is the child #my $wait = int rand 4; #sleep $wait; #print "Child $$ exiting after $wait seconds\n"; print "$_\n"; my $port = $allPort[$_]; my $timeTakenDm = timeit(1,sub { system(" /ms/dist +/pcs/bin/client hqsas501 $port 200 \-f /ms/user/j/juyva/dev/files_xls +_tmpl_cfg_nonscripts/sql1.clmod.NEW.txt > $port.txt " )}); if ($? == -1) { print "failed to execute: $!\n"; } elsif ($? & 127) { printf "child died with signal %d, %s +coredump\n", ($? & 127), ($? & 128) ? 'with' : 'wi +thout'; } else { printf "child exited with value %d\n", + $? >> 8; } print "Dm took $cpid:",timestr($timeTakenDm),"\n"; exit; } } } # Just parent code, after this my $obj = timeit(1, sub { &spawnChild; my (@allProd,@allTest); foreach my $port (@allPort) { open(HAN,"$port.txt") || die " $!"; my @temp = <HAN>; $hashOp{$port} = \@temp; #print $hashOp{$port}; close HAN; } my $timeSort = timeit(1, sub { @allProd = sort +@{$hashOp{"22600"}}; @allTest = sort +@{$hashOp{"22610"}}; }); print "sort took:",timestr($timeSort),"\n"; #my @allProd = sort @{$hashOp{"22600"}}; #my @allTest = sort @{$hashOp{"22610"}}; #print @allProd; #print @allTest; unless(@allProd == @allTest) { print " inside unequal rows retunred\n +"; my $whichhasmoreelements = @allProd > +@allTest ? 'allProd' : 'allTest'; if($whichhasmoreelements =~ /Prod/) { print " the no of lines do not ma +tch prod has more rows are they are \n"; my @tempallProd = @allProd; my @diffProdTest = splice(@tempa +llProd,(@allTest -1),(@allProd - @allTest)); print @diffProdTest; print " do u want to continue : e +nter y/n "; my $choice = <STDIN>; exit if($choice =~ /^n$/i); } else { print " the no of lines do not ma +tch test has more rows are they are \n"; my @tempallTest = @allTest; my @diffProdTest = splice(@tempal +lTest,(@allProd - 1),(@allTest - @allProd)); print @diffProdTest; print " do u want to continue : e +nter y/n "; my $choice = <STDIN>; print " $choice "; exit if($choice =~ /^n$/i); } } for( my $i = 0;$i < (@allProd > @allTest ? @ +allProd : @allTest); $i++) { unless($allProd[$i] eq $allTest[$i]) { my @defaultProd = split/\|/,$allP +rod[$i]; my @defaultTest = split/\|/,$allT +est[$i]; unless(@defaultProd == @defaultTe +st) { my $whichhasmoreelements = +@defaultProd > @defaultTest ? 'defaultProd' : 'defaultTest'; if($whichhasmoreelements =~ + /Prod/) { print " the no of l +ines do not match prod has more rows are they are \n"; my @tempallProd = @ +defaultProd; my @diffProdTest = + splice(@tempallProd,(@defaultTest -1),(@defaultProd - @defaultTest)) +; print @diffProdTest +; print " do u want t +o continue : enter y/n "; my $choice = <STDIN +>; exit if($choice =~ +/^n$/i); } else { print " the no of l +ines do not match test has more rows are they are \n"; my @tempallTest = @ +defaultTest; my @diffProdTest = +splice(@tempallTest,(@defaultProd - 1),(@defaultTest - @defaultProd)) +; print @diffProdTest +; print " do u want t +o continue : enter y/n "; my $choice = <STDIN +>; print " $choice "; exit if($choice =~ +/^n$/i); } } for( my $a = 0;$a < (@defaultProd + > @defaultTest ? @defaultProd : @defaultTest); $a++) { unless($defaultProd[$a] eq $ +defaultTest[$a]) { print " Column $a differ +s::"; print " PROD value $defa +ultProd[$a] : TEST value $defaultTest[$a] \n"; } } } } } ); print "code took:",timestr($obj),"\n"; # Just parent code, after this while ((my $cpid = wait()) != -1) { print "Waited for child $cpid\n"; } print "Parent Exiting\n";

      I hate to say this!. But if as you say, your job is dependant upon your solution to this project, I seriously suggest that you seek help from a local mentor with access to the code, data and hardware.

      Everything about the code you've posted,

      • from that you are timing sorts, and identical external commands which will take exactly the same time,regardless of whether they are a part of a forked or threaded solution.
      • to the way you lay out your code.
      • to your use of C coding idioms rather than Perl idioms.

      Suggest to me that you do not have the experience to tackling a project of this nature when your job is on the line as a result. I seriously wish you the very best of luck, but you need more help than can reasonably be provided through a forum such as this.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        BrowserUK is correct on all counts except one. Everyone has to learn the first time. Get a small job manager set up first, learn, expand.