cdfd123 has asked for the wisdom of the Perl Monks concerning the following question:

Here want to calculate the variance based on a sample
#!/usr/bin/perl -w use strict; open(FH,"$ARGV[0]") or die; my @temp=<FH>; close FH; my $mean = Mean (\@temp); my $variance = variance(\@temp); print "$variance\n"; sub sum { my ($arrayref) = @_; my $result; foreach(@$arrayref) { $result+= $_; } return $result; } sub Mean { my ($arrayref) = @_; my $result; foreach (@$arrayref) { $result += $_ } return $result / @$arrayref; } sub variance { return (sum map { ($_ - $mean)**2 } @_) / $#_; }
The error shows :

Can't use string ("2.47705346386633e+16") as an ARRAY ref while "strict refs" in use at variance_try1.pl line 16.
What it mean

Replies are listed 'Best First'.
Re: variance calculation
by FunkyMonk (Bishop) on Jan 12, 2008 at 13:47 UTC
    It means you're mixing lists and arrayrefs. You pass variance an arrayref, but you're processing @_ inside the subroutine.

    If your lists of numbers are small, just use arrays/lists throughout. Otherwise, make sure you properly dereference your arrayrefs.

    I'm sure CPAN has many such modules that will do these calculations for you if you want a tried & tested solution.

Re: variance calculation
by graff (Chancellor) on Jan 12, 2008 at 17:24 UTC
    Actually, I think the problem starts at this line in the variance sub:
    return (sum map { ($_ - $mean)**2 } @_) / $#_;
    That usage of map has the effect of passing a list of values to your "sum" function, but you have written "sub sum" to expect an array reference, and the error arises when you try to dereference the first arg to sum().

    It if you really want to pass an array reference to sum(), put square brackets around the map expression:

    return (sum [map { ($_ - $mean)**2 } @_]) / $#_;
    But then you'll discover another problem: since variance() is being passed an array ref (always just one arg), the the expression $#_ will always be zero, so it'll die with a "divide by zero" error. You need to divide by the number of elements in the array that is referenced by the first element of @_, and you also need to dereference that element to get the input list for map:
    return (sum [ map { ($_ - $mean)**2 } @{$_[0]} ] ) / @{$_[0]};
    I'm kind of rusty with my statistics arithmetic -- maybe you actually want the number of array elements minus 1? That would be $#{$_[0]}.
      ... maybe you actually want the number of array elements minus 1?

      Dividing by n is okay for purely descriptive purposes, i.e. when you're simply making a statement about the variability in the sample itself. If you're using the value as an estimate of the underlying population's variance, however, then use n - 1.

      It can be shown that the variance of a concrete sample is a biased estimate of the variance found in the population the sample is drawn from. Using n - 1 compensates for that bias.  For larger sample sizes, it doesn't make much difference anyway.

      In short, it's n for descriptive statistics, and n - 1 in the context of inferential statistics.

      thanks
Re: variance calculation
by ady (Deacon) on Jan 12, 2008 at 17:26 UTC