talexb has asked for the wisdom of the Perl Monks concerning the following question:

I've just started using Parallel::ForkManager in a script, it's pretty cool. The original, serial version of the script does a whole pile of API calls to BigCommerce for some information, and takes about 90 minutes to complete. Most of that time is waiting for the API response from BigCommerce.

The parallelized version with 6 kids takes 45 minutes to run (not bad). The same version with 50 kids (whee!) takes less than 13 minutes to complete. This is great, and the only problem I see is that the load gets up to between 15 and 17. The machine is a venerable IBM box with a Core 2 Duo CPU at 3 GHz, and doesn't seem to be too fussed by all of the processes zipping around.

There's 4G of RAM, and currently I see about 1G free; swap is the same size, and about 1/2M is used -- so just about nothing. The normal load for the machine is less than 1.0 -- right now, with two scripts running, it's about .50.

Is it safe to run with that many kids, under that load?

Alex / talexb / Toronto

Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

  • Comment on OT: Parallel::ForkManager and suitable load (Ubuntu)

Replies are listed 'Best First'.
Re: OT: Parallel::ForkManager and suitable load (Ubuntu)
by tybalt89 (Monsignor) on Jan 31, 2024 at 22:01 UTC

    I'd monitor CPU temps, and if it's not getting too hot, it's OK.

      If it does get too hot you may find krun useful:

      krun: run command within temperature window The command: krun 80 60 make test will run the subcommand 'make test' while monitoring CPU temperatures; if at any point the temperature goes above 80C, the subcommand (proces +s group) will be suspended until it falls below 60C, then allowed to res +ume.

      It relies on the 'sensors' library, which IIRC is available for Ubuntu as the lm-sensors package.

      Excellent! The page at askubuntu.com provided me the assistance I needed, and the results are promising:

      $ sensors nouveau-pci-0100 Adapter: PCI adapter GPU core: +1.20 V (min = +1.20 V, max = +1.32 V) fan1: 0 RPM temp1: +63.0°C (high = +95.0°C, hyst = +3.0°C) (crit = +115.0°C, hyst = +2.0°C) (emerg = +130.0°C, hyst = +10.0°C) coretemp-isa-0000 Adapter: ISA adapter Core 0: +50.0°C (high = +80.0°C, crit = +100.0°C) Core 1: +46.0°C (high = +80.0°C, crit = +100.0°C) $
      I'll have this running when I do my next test.

      Alex / talexb / Toronto

      Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

      I just did a complete run, keeping an eagle eye on the temperature of the two cores.

      At the start, the temps were 50C+44C. After starting the script with 50 kids, the temperatures climbed to about 80C+77C, by about a third of the way through. The 'high' value was 80C, and the 'critical' value was 100C, so I watched to see if the temperature got significantly closer to 100C -- but they did not.

      After the script completed, the temperatures fell back down to a normal range -- 55C+51C within five minutes. So at this point, I'm fairly confident that running with 50 kids is safe for the hardware that I currently have.

      Thanks for all your input!

      Alex / talexb / Toronto

      Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

        Modern CPUs regulate their own clocks based on temperature, so you can crank it to the max. The question is does the speed drop. Either way that has nothing to do with parrallisation

        If each job is purely CPU crunching you shouldn't see any improvement having more kids than you do cores.

      we put the PC case on a chair carefully arranged in front of my secondary fridge, placed the harddisk inside the fridge, and turned the fridge to maximum cooling

      -- from afoken's coolest recovery

      And if your PC gets too hot, just ask afoken for advice. :-)

      👁️🍾👍🦟
        if your PC gets too hot, just ask afoken for advice

        *g*

        Having a research center with a good supply of liquid helium in town surely helps. ;-)

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: OT: Parallel::ForkManager and suitable load (Ubuntu)
by perlfan (Parson) on Feb 29, 2024 at 05:12 UTC
    I'd be more worried about memory creep and getting into a swapping situation and penalties associated with context switching - unless you can pin the processes using numactl or the like. I regularly go for high-score on load testing and imagine that one day I might actually achieve full combustion, ala "halt and catch fire". :-)

      I'm in the process of setting up a new machine that has six cores, so perhaps that box will be better at handling this multi-kid version.

      Interestingly, the original server had stability problems, so I reduced the number of kids from 50 down to 20, and this only increased the run time from 15 minutes to 18 minutes. The load was about half the original.

      Alex / talexb / Toronto

      Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

        > Interestingly, the original server had stability problems, ...

        Greetings, talexb. That can be from many kids initiating a connection simultaneously in a tiny window. The following is a P::FM like code. The interesting bit is MCE::Hobo->yield, helpful for staggering connection instantiations. MCE::Hobo->yield defaults to 0.008 seconds on UNIX. Call yield prior to making a remote connection.

        use v5.10.1; use MCE::Hobo; MCE::Hobo->init( max_workers => 50, posix_exit => 1, on_finish => sub { my ($pid, $exit_code, $ident, $exit_signal, $error, $resp) = @ +_; print "child $pid completed: $ident => ", $resp->[0], "\n"; } ); foreach my $data ( 1..2000 ) { MCE::Hobo->create( $data, sub { MCE::Hobo->yield(0.008); # sleep 1; # simulate connection instantiation [ $data * 2 ]; }); } MCE::Hobo->wait_all;

        Without the "sleep 1", the demonstration completes in ~ 0.8 seconds (without yield) and ~ 16.0 seconds (with yield). Otherwise, ~ 42 seconds simulating work. No matter how many workers, the serial-delay capability will not allow more than 125 connection instantiations per second with MCE::Hobo->yield(); default 0.008 seconds.

        For use-cases like this, serialized delay/yield mitigates many workers initiating a connection concurrently in a tiny window, improving reliability.