in reply to A lesson in statistics

For context:


po  Pages paged out
fr  Pages freed per second

Since po=1 and fr=0 is more than a million times "worse" than your "15 times" threshold and yet I really doubt it represents a situation that you want to be worried about, I think your "15 times" criteria is not enough.

Your problem sample data shows samples where every single sample has fr <= 15*po so, of course, it fires the "15 times" alarm. That problem is more with your choice of alarm criteria than with your arithmetic.

If your "15 times" does a good job even for quite large values (it certainly doesn't for very small values), then perhaps you just need to add a minimum criterion. Forcing fr=1 as a minimum is a fine way of saying that pr < 15 is never alarming.

So if pr stays at 14 for many samples while fr stays at 0 for many samples, is that indicative of a problem? It goes off the scale for your stated "15 times" criteria. But it never reaches the criteria if you set a minimum of 1 for fr.

Is po=300,fr=15 really much more worrying than po=3000,fr=250 ?

So play with some more data and figure out criteria that better represent the situation you are worried about than just "15 times".

- tye        

Replies are listed 'Best First'.
Re^2: A lesson in statistics (no, specs)
by 0xbeef (Hermit) on Mar 20, 2007 at 05:26 UTC
    Sorry for misleading you, but my initial example is bogus - I merely tried to illustrate the problem I had in requiring the zero-values to be significant in the ratio.

    The real-life alert is called the Thrashing Severity Ratio, and is for a po:fr ratio = 1/6 (17%). This is described by Tom Farwell in a writeup of paging spaces, and may be somewhat specific to IBM's AIX.

    My problem with that writeup is two-fold:
    1. Periods of inactivity (0,0 values) are not given enough weight (this may lead to false positives)
    2. The overall volume (po = 4k pages swapped to paging) is not considered, and low volume spikes may provide additional false positives (but NOT if sustained).

    I should perhaps have mentioned the actual problem from the start, but I fear the downvote of Monks who feel that this discussion is not close enough to a pure perl problem!

    Niel