in reply to variance calculation

Actually, I think the problem starts at this line in the variance sub:
return (sum map { ($_ - $mean)**2 } @_) / $#_;
That usage of map has the effect of passing a list of values to your "sum" function, but you have written "sub sum" to expect an array reference, and the error arises when you try to dereference the first arg to sum().

It if you really want to pass an array reference to sum(), put square brackets around the map expression:

return (sum [map { ($_ - $mean)**2 } @_]) / $#_;
But then you'll discover another problem: since variance() is being passed an array ref (always just one arg), the the expression $#_ will always be zero, so it'll die with a "divide by zero" error. You need to divide by the number of elements in the array that is referenced by the first element of @_, and you also need to dereference that element to get the input list for map:
return (sum [ map { ($_ - $mean)**2 } @{$_[0]} ] ) / @{$_[0]};
I'm kind of rusty with my statistics arithmetic -- maybe you actually want the number of array elements minus 1? That would be $#{$_[0]}.

Replies are listed 'Best First'.
Re^2: variance calculation
by almut (Canon) on Jan 12, 2008 at 19:19 UTC
    ... maybe you actually want the number of array elements minus 1?

    Dividing by n is okay for purely descriptive purposes, i.e. when you're simply making a statement about the variability in the sample itself. If you're using the value as an estimate of the underlying population's variance, however, then use n - 1.

    It can be shown that the variance of a concrete sample is a biased estimate of the variance found in the population the sample is drawn from. Using n - 1 compensates for that bias.  For larger sample sizes, it doesn't make much difference anyway.

    In short, it's n for descriptive statistics, and n - 1 in the context of inferential statistics.

Re^2: variance calculation
by cdfd123 (Initiate) on Jan 12, 2008 at 17:47 UTC
    thanks