captainentropy has asked for the wisdom of the Perl Monks concerning the following question:

I'm not sure what would be the best title to properly give this request since I probably don't know what I'm really doing in the first place :( sorry. I'm having a problem with what is probably basic Perl structure. I can't figure it out, and a colleague of mine who has more Perl experience is stumped too.

I wrote this script that opens a file, reads in a set of coordinates and then using Math::Interpolate creates a curve. In a loop the curve is interpolated for Y from a range of X-values. Once all the new Y-values have been calculated it moves to the next row and repeats. OK, so far so good. It works as in that it doesn't crash and the data is more or less usable, however, the resulting curve does not pass through the Y-maximum that it should (which was in the control set). So, I need to scale the curve such that no point goes beyond the ceiling that it was given. How can I store this data in an array, calculate its max value, create a scaling factor based on the original max Y-value (Y-maxOld/Y-maxNew), and then multiply each element in that array with this scaling factor (and then repeat for the next set of coordinates)?

I know I can use List::Util qw(max) for finding the max value in an array or list, but I just don't know how to incorporate this.

Here's the code I'm using:
#!/usr/bin/perl use warnings; use Getopt::Long; use Getopt::ArgvFile; use Math::Interpolate qw(derivatives robust_interpolate); use Math::Round; use List::Util qw(max); @peaks = $ARGV[0]; open ( IN, "@peaks" ) or die "Can't open file: $!"; while (<IN>) { chomp; my @fields = split ("\t",$_); my @x=(0,$fields[4]*.25,$fields[4],($fields[3]-($fields[3]*0.25)), +$fields[3]); # these are the training X-values my @y=(0,$fields[6]*0.4,$fields[6],$fields[6]*0.4,0); # these are +the training Y-values my @dy = derivatives(\@x, \@y); my $iter=1; print "fixedStep chrom=$fields[0] start=$fields[1] step=1\n"; while ($iter <= $fields[3]) { my ($r_y, $r_dy) = robust_interpolate($iter, \@x, \@y); my @ynew2 = nearest(0.01,($r_y)); print "@ynew2\n"; $iter ++; } } close IN;

Here's a graph of some real data. The Excel curve is based on three control points, the Math::Interpolate one is based on five controls. The height of the Excel curve is what I need to scale the other one to.

https://www.dropbox.com/s/6dikruncc02427v/Interpolate_example1.pdf

Replies are listed 'Best First'.
Re: operate on an array created in a subroutine
by SuicideJunkie (Vicar) on Oct 15, 2013 at 21:52 UTC

    It would probably be a good idea to plot your results and then consider whether you are feeding the right data in and whether you've got that data in the right format, and whether you're using the right kind of interpolation for your task. The interpolation itself is almost certainly correct, and tweaking the values afterwards is never the right thing to do.

    If the interpolation is quadratic or higher, then you will likely see some points in the interpolation that are higher than your maximum sample value.

    Scaling the values down would make all the values (except zeroes) wrong in that case.

    Consider this interpolation, where data points are the stars: __ __/ \__ _/ *_ _* \_ / \ / \ * *

    Many of the interpolated points are much higher than the highest sample. As they should be.

    If your data is noisy, then the interpolation may not actually pass through any of the points. But that's also OK.

    * * * * * ---------------------- * * * * *

      I must have posted the update while you were replying. In my OP I put a link to the output. The output curve is right, there's nothing like you're suggesting. I only need to the shape of the curve to be approximate (since I have no idea what it really should look like), and although the X-value at Y-max isn't exact in my output, it's close enough for what I need, but its height has to be exact.

        The Excel curve is based on three control points, the Math::Interpolate one is based on five controls.

        I presume that means the excel plot is using a quadratic interpolation (3 pts).

        If that's true, then you need to set M:I to use a quadratic interpolation as well if you want the results to match closely. Even then, using a different set of data (5 pts instead of 3) will ensure that your interpolation is slightly different.

        If you want it to be exact, you need to use the exact same input.

      <awestruck></awestruck>

      Dude, you are so getting one of my ++s tomorrow.

      at least so far I haven't seen any weird output. I'll be on the lookout for sure.