in reply to std dev calculations slow over time

I'll guess that the progressive slowdown looks like the sth_select_hydro SQL, which has to do progressively more work to find the 100 rows that you want.

I would select all the rows first and put the data into a text file, or RAM if you have enough of it.

You mentioned the speed in Matlab being faster. The Matlab code may be smart enough to take advantage of the massive redundancy in your calculation. As you step through the array, your calculation has to operate on what is mostly the same list of numbers over and over. The only points that change are the first and last points in the array. So the clever Matlab routine detects this and does a much smaller calculation, in effect subtracting off the last number from the sum and adding the first number to the sum. It is storing the points for the average in a circular buffer and avoiding the work of recalculation. There are special forms of the statistical formulas for average and standard deviation that enable a result to be incrementally updated. The formulas are in the Wikipedia article on Standard Deviation, in the section 'Rapid Calculation Methods.'

Also, the moving average is going to move very slowly from point to point. Usually, you don't need to know that many points in a moving average, and you don't really need to calculate them all. This is called decimation.

It should work perfectly the first time! - toma
  • Comment on Re: std dev calculations slow over time