chaoticset has asked for the wisdom of the Perl Monks concerning the following question:

Yes, it's simple. Yes, it's pathetic.

That's why I need help...!

This is essentially unedited; I decided I needed a standard deviation sub, so I wrote one. Then I remembered that I should get some of my code looked at for stylistic errors (and/or efficiency problems) and I figured, "Why not?" I stuck a little test of the sub on the end, and, here it is.

Be forewarned: This is probably bad code. This is most likely bad *design*. I'm using the shotgun theory here, which is essentially "code it and see what happens". I'm not good at any of this yet. You were warned.

#!c:\perl\bin\perl.exe -w use strict; use warnings; # This subroutine, given a set of values, should return the standard +deviation. sub standard_deviation { my $mean = 0; my $sample_size; my @differences; for (@_) { $mean = $mean + $_; } $sample_size = scalar(@_); $mean = $mean/$sample_size; for (@_) { @differences = (@differences, (($_-$mean)*($_-$mean))); } my $standard = 0; for (@differences) { $standard = $standard + $_; } $standard = sqrt($standard/$sample_size); return $standard; } my $stddev; my @values; @values = (22, 17, 29, 39, 20, 32); $stddev = standard_deviation(@values); print "The standard deviation is $stddev.";
Could this have been shorter? Should it have been longer? Is there some convention of math subs that's normally followed that I (forgot|don't know about)? Did I leave something out? Did I leave something in?

Anything at all, whatever it is, if it's not quite right or it just looks bad, please, let me know.

-----------------------
You are what you think.

Replies are listed 'Best First'.
Re: Code Review! Go Ahead, Rip It Up!
by japhy (Canon) on Jan 13, 2002 at 04:20 UTC
    Here you go. This one's free! ;)
    sub standard_deviation { # declared all at once my ($mean, $std_dev) = (0,0); # idioms are handy -- I like the postfix 'for' # and '$x = $x + $y' can be written as '$x += $y' $mean += $_ for @_; # no need for $size, it's used once, and is @_ # again, using the 'A = A OP B' => 'A OP= B' trick $mean /= @_; # appending to arrays is best done with push() # @big_array = (@big_array, $x) can get slow :( # for (@_) { # push @diffs, ($_ - $mean)**2; # } # that can be rewritten as a map() # @diffs = map { ($_ - $mean)**2 } @_; # but we only use @diffs ONCE, so we don't need it $std_dev += ($_ - $mean)**2 for @_; # and $size was removed, so we use @_ again return $std_dev / @_; }
    When it comes down to it, the function becomes:
    sub standard_deviation { my ($mean, $std_dev) = (0,0); $mean += $_ for @_; $mean /= @_; $std_dev += ($_ - $mean)**2 for @_; return $std_dev / @_; }

    _____________________________________________________
    Jeff[japhy]Pinyan: Perl, regex, and perl hacker.
    s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

Re: Code Review! Go Ahead, Rip It Up!
by VSarkiss (Monsignor) on Jan 13, 2002 at 04:04 UTC

    I'm sure this isn't the answer you were looking for, but honestly, look in CPAN for general things like this. You'll save yourself a lot of missteps. For instance, Statistics::Descriptive seems to be just what you wanted.

    use Statistics::Descriptive; my $s = Statistics::Descriptive->new(); $s->AddData(22, 17, 29, 39, 20, 32); print "The standard deviation is ", $s->StandardDeviation();
    That's how I'd do it, anyway.

    HTH

Re: Code Review! Go Ahead, Rip It Up!
by dws (Chancellor) on Jan 13, 2002 at 04:05 UTC
      for (@_) { @differences = (@differences, (($_-$mean)*($_-$mean))); } To my eye,
    foreach ( @_ ) { push @differences, $_ - $mean; }
    reads better.

    In the fragment

    my $standard = 0; for (@differences) { $standard = $standard + $_; } $standard = sqrt($standard/$sample_size); return $standard;
    you use $standard for two semantically different (though related) purposes. Having variables that mean two (or more) different things within a subroutine can get you into trouble later (say, when you pick up the code a couple of months from now). You can eliminate this problem by writing
    my $sum_diff = 0; map { $sum_diff += $_ } @differences; return sqrt($sum_diff / $sample_size);

    You might also consider using 4 spaces for indentation.

    And at some point, somebody is going to pass an empty list...

Re: Code Review! Go Ahead, Rip It Up!
by blakem (Monsignor) on Jan 13, 2002 at 04:14 UTC
    Perhaps a trip to CPAN would have helped... I might use somthing like this:
    #!/usr/bin/perl -wT use strict; use Statistics::Lite qw(stddev); my @data = 20..30; my $standard_deviation = stddev(@data);

    -Blake

Re: Code Review! Go Ahead, Rip It Up!
by toma (Vicar) on Jan 13, 2002 at 15:34 UTC
    Writing your own numerical code is always a challenge, especially if you want robust results for a wide range of input data. It is easy to lose precision somewhere, or to miss an error condition, such as needing at least two data points.

    Dividing by the number of samples is incorrect (1). Standard deviation pseudocode is:

    s=sqrt((sum(samples)-mean(samples))/(num_samples-1))

    There are many interesting ways to calculate standard deviation. One provides incremental results. That is, it minimizes calculations and storage such that you can find the standard deviation of a stream of numbers without having all the samples in memory at the same time. You can just read in another value and and recalculate the standard deviation. It takes advantage of this form:

    s=sqrt((num_samples*sum(squares(samples))-square(sum(samples)))/(num +_samples*(num_samples-1)))
    So you just keep a track of the number of samples, the sum of the squares of the samples, and the sum of the samples. This is how the old hand calculators were able to calculate standard deviation using very little memory.

    (1) CRC Standard Mathematical Tables, 23rd ed, pg 573

    It should work perfectly the first time! - toma

Re: Code Review! Go Ahead, Rip It Up!
by s0ttle (Scribe) on Jan 13, 2002 at 10:48 UTC
    I didn't see anyone mention the use of the -w switch in conjunction with 'use warnings' which shouldn't be done, but I'm pretty sure that using both isn't necessary here.


    -tengo mas que aprender