Win has asked for the wisdom of the Perl Monks concerning the following question:

Please could someone tell me if the following looks ok?
sub correlation { my ($x_ref, $y_ref) = @_; my @array_of_x_values = @$x_ref; my @array_of_y_values = @$y_ref; my $output = "output_test.txt"; open (OUTPUT_TEST, ">$output"); print OUTPUT_TEST Dumper ($x_ref, $y_ref); #print Dumper length($x_ref); #print Dumper length($y_ref); my $lfit = Statistics::LineFit->new(); $lfit->setData($x_ref, $y_ref); my ($intercept, $slope) = $lfit->coefficients(); my $rSquared = $lfit->rSquared(); my $hashref = { a => $intercept, b => $slope, c => $rSquared }; my $number_of_elements_in_array = 0; my @x_values_for_points_above_the_trend_line; my @y_values_for_points_above_the_trend_line; my @x_values_for_points_below_the_trend_line; my @y_values_for_points_below_the_trend_line; my @x_values_for_points_on_the_trend_line; my @y_values_for_points_on_the_trend_line; foreach (@array_of_x_values){ $number_of_elements_in_array++; my $x_point = $_; if ($array_of_y_values[$number_of_elements_in_array] > $intercept ++ ($x_point * $slope)){ #points are above the trend line push (@x_values_for_points_above_the_trend_line, $x_point); push (@y_values_for_points_above_the_trend_line, $array_of_y_val +ues[$number_of_elements_in_array]); } elsif ($array_of_y_values[$number_of_elements_in_array] < $interce +pt + ($x_point * $slope)){ #points are below the trend line # $y_values_for_points_below_the_trend_line = push (@x_values_for_points_below_the_trend_line, $x_point); push (@y_values_for_points_below_the_trend_line, $array_of_y_val +ues[$number_of_elements_in_array]); } else { push (@x_values_for_points_on_the_trend_line, $x_point); push (@y_values_for_points_on_the_trend_line, $array_of_y_values +[$number_of_elements_in_array]); } } return $hashref, \@x_values_for_points_above_the_trend_line, \@y_val +ues_for_points_above_the_trend_line, \@x_values_for_points_below_the_ +trend_line, \@y_values_for_points_below_the_trend_line, \@x_values_fo +r_points_on_the_trend_line, \@y_values_for_points_on_the_trend_line; }

Replies are listed 'Best First'.
Re: Positioning in relation to trend line
by GrandFather (Saint) on Dec 05, 2007 at 20:26 UTC

    Looks pretty bad to me:

    • Inconsistent indentation.
    • Excess vertical white space.
    • Excessively long variable names.
    • Repeated repetition again and again of y point lookup
    • Needless manipulation of parallel arrays
    • Too many items returned - huge possibility for mismatched lists
    • Pointless copy of input arrays
    • Silly loop variable assignment: my $x_point = $_;. Should use for my $x_point (... instead
    • Dumb keys in stupidly named hashref

    I'd rework it to look something like:

    sub correlation { my ($x_ref, $y_ref) = @_; my $lfit = Statistics::LineFit->new (); $lfit->setData ($x_ref, $y_ref); my ($intercept, $slope) = $lfit->coefficients (); my $rSquared = $lfit->rSquared (); my %results = (intercept => $intercept, slope => $slope, rSquared => $rSquar +ed); foreach my $index (0 .. $#$x_ref) { my ($xPoint, $yPoint) = ($x_ref->[$index], $y_ref->[$index]); if ($yPoint > $intercept + ($xPoint * $slope)) { push @{$results{above}}, [$xPoint, $yPoint]; } elsif ($yPoint < $intercept + ($xPoint * $slope)) { push @{$results{below}}, [$xPoint, $yPoint]; } else { push @{$results{on}}, [$xPoint, $yPoint]; } } return \%results; }

    Perl is environmentally friendly - it saves trees
Re: Positioning in relation to trend line
by moritz (Cardinal) on Dec 05, 2007 at 20:04 UTC
    It strongly depends on your notion of "ok" ;-)

    The last line, in the very least, gives me headaches. Do you really want to return 7 references from a subroutine and remember what they are and in which order?

    I'd suggest the usage of a hash ref instead:

    return { metadata => $hashref, above => { x => ..., y => ..., }, below => { x => ..., y => ..., }, on => { ... }, };

    As for the mathematics involved I didn't see a flaw. But you should test it nevertheless, with small sets of selected data.