Positioning in relation to trend line

Win has asked for the wisdom of the Perl Monks concerning the following question:

Please could someone tell me if the following looks ok?



sub correlation {

  my ($x_ref, $y_ref) = @_;
  
  my @array_of_x_values = @$x_ref;
  my @array_of_y_values = @$y_ref;



  my $output = "output_test.txt";

  open (OUTPUT_TEST, ">$output");

  print OUTPUT_TEST Dumper ($x_ref, $y_ref);

  #print Dumper length($x_ref);
  #print Dumper length($y_ref);

  my $lfit = Statistics::LineFit->new();
  $lfit->setData($x_ref, $y_ref);
  
  my ($intercept, $slope) = $lfit->coefficients();
  my $rSquared = $lfit->rSquared();
  
  my $hashref = {
         a => $intercept, 
         b => $slope, 
         c => $rSquared
        };


  my $number_of_elements_in_array = 0;


  my @x_values_for_points_above_the_trend_line;
  my @y_values_for_points_above_the_trend_line;

  my @x_values_for_points_below_the_trend_line;
  my @y_values_for_points_below_the_trend_line;

  my @x_values_for_points_on_the_trend_line;
  my @y_values_for_points_on_the_trend_line;
  


  foreach (@array_of_x_values){
    
    $number_of_elements_in_array++;
    
    my $x_point = $_;

    if ($array_of_y_values[$number_of_elements_in_array] > $intercept 
++ ($x_point * $slope)){

      #points are above the trend line
      
      push (@x_values_for_points_above_the_trend_line, $x_point);
      push (@y_values_for_points_above_the_trend_line, $array_of_y_val
+ues[$number_of_elements_in_array]);
      

    }
    elsif ($array_of_y_values[$number_of_elements_in_array] < $interce
+pt + ($x_point * $slope)){
    
      #points are below the trend line

    #  $y_values_for_points_below_the_trend_line = 

      push (@x_values_for_points_below_the_trend_line, $x_point);
      push (@y_values_for_points_below_the_trend_line, $array_of_y_val
+ues[$number_of_elements_in_array]);
      
    }
    else {

      push (@x_values_for_points_on_the_trend_line, $x_point);
      push (@y_values_for_points_on_the_trend_line, $array_of_y_values
+[$number_of_elements_in_array]);

    }

  }

  return $hashref, \@x_values_for_points_above_the_trend_line, \@y_val
+ues_for_points_above_the_trend_line, \@x_values_for_points_below_the_
+trend_line, \@y_values_for_points_below_the_trend_line, \@x_values_fo
+r_points_on_the_trend_line, \@y_values_for_points_on_the_trend_line;
  

}
[download]

Comment on Positioning in relation to trend line Download Code

Replies are listed 'Best First'.
Re: Positioning in relation to trend line by GrandFather (Saint) on Dec 05, 2007 at 20:26 UTC
Looks pretty bad to me: Inconsistent indentation. Excess vertical white space. Excessively long variable names. Repeated repetition again and again of y point lookup Needless manipulation of parallel arrays Too many items returned - huge possibility for mismatched lists Pointless copy of input arrays Silly loop variable assignment: `my $x_point = $_;`. Should use `for my $x_point (...` instead Dumb keys in stupidly named hashref I'd rework it to look something like: sub correlation { my ($x_ref, $y_ref) = @_; my $lfit = Statistics::LineFit->new (); $lfit->setData ($x_ref, $y_ref); my ($intercept, $slope) = $lfit->coefficients (); my $rSquared = $lfit->rSquared (); my %results = (intercept => $intercept, slope => $slope, rSquared => $rSquar +ed); foreach my $index (0 .. $#$x_ref) { my ($xPoint, $yPoint) = ($x_ref->[$index], $y_ref->[$index]); if ($yPoint > $intercept + ($xPoint * $slope)) { push @{$results{above}}, [$xPoint, $yPoint]; } elsif ($yPoint < $intercept + ($xPoint * $slope)) { push @{$results{below}}, [$xPoint, $yPoint]; } else { push @{$results{on}}, [$xPoint, $yPoint]; } } return \%results; } [download] Perl is environmentally friendly - it saves trees	[reply] [d/l] [select]
Re: Positioning in relation to trend line by moritz (Cardinal) on Dec 05, 2007 at 20:04 UTC
It strongly depends on your notion of "ok" ;-) The last line, in the very least, gives me headaches. Do you really want to return 7 references from a subroutine and remember what they are and in which order? I'd suggest the usage of a hash ref instead: `return { metadata => $hashref, above => { x => ..., y => ..., }, below => { x => ..., y => ..., }, on => { ... }, };` [download] As for the mathematics involved I didn't see a flaw. But you should test it nevertheless, with small sets of selected data.	[reply] [d/l]