Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

variance calculation

by cdfd123 (Initiate)
on Jan 12, 2008 at 12:33 UTC ( [id://662051]=perlquestion: print w/replies, xml ) Need Help??

cdfd123 has asked for the wisdom of the Perl Monks concerning the following question:

Here want to calculate the variance based on a sample
#!/usr/bin/perl -w use strict; open(FH,"$ARGV[0]") or die; my @temp=<FH>; close FH; my $mean = Mean (\@temp); my $variance = variance(\@temp); print "$variance\n"; sub sum { my ($arrayref) = @_; my $result; foreach(@$arrayref) { $result+= $_; } return $result; } sub Mean { my ($arrayref) = @_; my $result; foreach (@$arrayref) { $result += $_ } return $result / @$arrayref; } sub variance { return (sum map { ($_ - $mean)**2 } @_) / $#_; }
The error shows :

Can't use string ("2.47705346386633e+16") as an ARRAY ref while "strict refs" in use at variance_try1.pl line 16.
What it mean

Replies are listed 'Best First'.
Re: variance calculation
by FunkyMonk (Chancellor) on Jan 12, 2008 at 13:47 UTC
    It means you're mixing lists and arrayrefs. You pass variance an arrayref, but you're processing @_ inside the subroutine.

    If your lists of numbers are small, just use arrays/lists throughout. Otherwise, make sure you properly dereference your arrayrefs.

    I'm sure CPAN has many such modules that will do these calculations for you if you want a tried & tested solution.

Re: variance calculation
by graff (Chancellor) on Jan 12, 2008 at 17:24 UTC
    Actually, I think the problem starts at this line in the variance sub:
    return (sum map { ($_ - $mean)**2 } @_) / $#_;
    That usage of map has the effect of passing a list of values to your "sum" function, but you have written "sub sum" to expect an array reference, and the error arises when you try to dereference the first arg to sum().

    It if you really want to pass an array reference to sum(), put square brackets around the map expression:

    return (sum [map { ($_ - $mean)**2 } @_]) / $#_;
    But then you'll discover another problem: since variance() is being passed an array ref (always just one arg), the the expression $#_ will always be zero, so it'll die with a "divide by zero" error. You need to divide by the number of elements in the array that is referenced by the first element of @_, and you also need to dereference that element to get the input list for map:
    return (sum [ map { ($_ - $mean)**2 } @{$_[0]} ] ) / @{$_[0]};
    I'm kind of rusty with my statistics arithmetic -- maybe you actually want the number of array elements minus 1? That would be $#{$_[0]}.
      ... maybe you actually want the number of array elements minus 1?

      Dividing by n is okay for purely descriptive purposes, i.e. when you're simply making a statement about the variability in the sample itself. If you're using the value as an estimate of the underlying population's variance, however, then use n - 1.

      It can be shown that the variance of a concrete sample is a biased estimate of the variance found in the population the sample is drawn from. Using n - 1 compensates for that bias.  For larger sample sizes, it doesn't make much difference anyway.

      In short, it's n for descriptive statistics, and n - 1 in the context of inferential statistics.

      thanks
Re: variance calculation
by ady (Deacon) on Jan 12, 2008 at 17:26 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://662051]
Approved by FunkyMonk
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (2)
As of 2024-04-26 03:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found