in reply to Re^4: portable way to get system load average
in thread portable way to get system load average

i need this functionality for throttling purposes. daemons written in perl that sleep in the background based on load. the daemons doing maintanence shouldn't starve the ones making money. i know load is not perfect for this, but it's usable.

The point is, that won't work! By the time your code has obtained an instantaneous cpu load reading, it has changed. And by the time you're code has decided what to do about the out-of-date reading it has, it has changed again.

Have you heard of Hysteresis?

You will either drive your systems into chaotically unstable boom-bust cycles where everything sleeps and then they all wake up together and you get a huge burst of activity; which they all detect and therefore go back to sleep.

Or, you smooth the reading by averaging them together over a few seconds. Then your code is reacting to an approximation of the real situation as it was a few seconds ago. Net result is that anything from 10% to 30% of your cpu is spent accessing, calculating, and deciding whether to sleep or work; and wasting context switches by sleeping when you could be doing useful work.

And whatever calculations you come up with for your breakpoints through trial & error, every time a new process is added, or a new device; or one of either is removed; or you move to a faster cpu; or more cores; or get a faster disk; or the cpu drops the clock rate because one or more cores are overheating; all your calculations go right up the swanny.

Load balancing/throttling processes based upon instantaneous cpu loading doesn't work. It never has, and never will!

Besides which, every modern OS has a very simple, and extremely effective way of ensuring that "money making processes" are favoured over "maintenance processes". The keyword here is PRIORITY.

Look up "nice" if you're on a *nix system; or type help start at a command line prompt if you're on Windows. It can also be done programmically.

saying "you dont need this really" just because perl cannot provide it without extra strap-ons is apologetics at its worst.

I'm certainly not apologising. Perl can provide it quite easily, but other than writing replacements for top/TaskManager, it doesn't have a use. Hence no demand.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
RIP an inspiration; A true Folk's Guy

Replies are listed 'Best First'.
Re^6: portable way to get system load average
by loadaverage (Novice) on Jul 07, 2010 at 12:42 UTC

    yes, i know about priority.

    you have made an awfully lot of assumptions about our systems and our programs... not that i dont agree with you about the load average. it's just a number and there are countless discussions online why it is not such a useful metric (especially as all systems calculate it differently). but it all depends on the context in which its results are used. if you have long running processes that put sustained load on your systems and when the thresholds defined are more like rough delimiters it is a great metric to have around for micromanaging whether a non-critical daemon should wake up or just say "oh, i see you are kind of busy-ish, i'll come back again later".

    i think we are getting offtopic. i posted this question because i thought i might have missed something in the immensely huge world of perl. no quick luck, so let's move on, thanks for the answers...

      you have made an awfully lot of assumptions about our systems and our programs...

      Um. No. I haven't. As someone who spent an awful long time trying to satisfy a set of misbegotten requirements--that I'd identified as such very early on, but was not in a position to counter the bs--I can assure you, to the tune of €15 million development effort by some every clever people; it ... doesn't ... work! .

      Take your example--"long running processes that put sustained load on your systems". If at the very instant that you choose to query the current cpu load average; a tcp packet arrives; or an asynchronous IO completes; or a process forks; or a process malloc()s an amount of memory that exceeds that process' current virtual memory footprint; or an interrupt (soft or hard) occurs; or any of a zillion other events happen on what was a clock-tick earlier a totally quiescent system; then you will recive back a (near) 100% reading. Because when a thread is ready to run, it will run at 100% cpu until it hits a reason to block.

      So, your system that may have been running at 1% or 2% load average for minutes (or hours or days) will, for that brief instance that you do your query, appear to be running flat out. And you are going to base long term decisions opon that?

      NB:Anything lasting more than a microsecond is "long term" in the context of modern cpus running at 2+ GHz.

      Translation: Even a system, apparently running at 3% load average; is actually running at 100% for a few clock cycles, and then 0% for a few more; and then 100% for a few more; and then 0% for a few more. And so on ad nauseum.

      What ever system defined mechanism you use (eg. GetSystemTimes()), has already performed some essentially arbitrary calculation to produce a figure that is somewhere between 0% and 100%. And in the time it takes (even C code) to

      1. issue the call;
      2. transfer through from ring 3 to ring 0;
      3. query the appropriate privileged cpu registers;
      4. wait for processor cache to be flushed;
      5. access the static memory location(s) holding the previous state;
      6. Wait (if required) for the data cache to flush and the cache line to be re-populated;
      7. retrieve the raw value;
      8. perform its own averaging calculation;
      9. update the static state;
      10. stack teh calculated value to be returned;
      11. transition through the call gates from ring 0 to ring 3;

      Now your code has its "instantaneous" cpu load average value... but it is already many thousands, if not millions, of clock cycles out-of-date.

      Nothing, repeat, nothing you can do with this value will in any way reflect reality. Because reality already moved on at every clock cycle between you issuing the request, and receiving the reply.

      You might as well, for all the difference in accuracy it will make, base your scheduling/priority/throttling mechanism upon

      my $load = int( rand 1000 ) / 10;.

      Yes. What you receive, by the time you receive it, really is that arbitrary.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        you seem to have shifted the discussion to some "constantly current" system load, when i was (i hope) clearly talking about an average as in getloadavg(3):
        The system imposes a maximum of 3 samples, representing averages over the last 1, 5, and 15 minutes, respectively.
        
        as i said, in certain context this metric can be a useful indicator (as opposed to an absolute value).
Re^6: portable way to get system load average
by JavaFan (Canon) on Jul 07, 2010 at 14:11 UTC
    By the time your code has obtained an instantaneous cpu load reading, it has changed.
    True, but that's not what the OP wants. He mentions an interface to getloadavg which returns the average load of the past 1, 5 and 15 minutes.

      I know. But as I already pointed out, basing real-time decisions upon what happened 128 billion clock cycles ago, is like trying to play the stock market based upon event that happened circa the building of the Great Pyramid. And that's just the 1 minute figure... the other two are geometrically more useless.

      As any day-trader will tell you, even the 20 minute delay of the popular "free" stock market quotes is waaay too long.

        i think you lost me.

        the 1 minute means a simple rolling average, not "1 minute ago". i am making the decision based on an average calculated for "128 billion cycles" up to "just before now", not "128 billion cycles ago". that number is "good enough" indicator for my purposes.