oko1 has asked for the wisdom of the Perl Monks concerning the following question:

Hi, all:

I've just spent a while struggling with a problem that's come up a few times previously, and I'm hoping to get some help from the accomplished monks here. I've got it solved by brute force, and am hoping for a smarter (and perhaps more readable) solution.

I've lost a lot of weight lately, and wanted to plot it on a graph. Unfortunately, I haven't been keeping any kind of a regular track of it - I've been weighing myself whenever it occurs to me - and so I have a very scattered and scant set of data:

6/26/2010 334 8/12/2010 311.8 8/19/2010 308.4 9/5/2010 300.0 9/9/2010 298.6 9/14/2010 297.2 9/16/2010 293.6

Nevertheless, I decided to plot it. Despite my extremely rusty math skills, I figured out that I need to produce a timeline split into some number of intervals, and interpolate the weight at that time based on the two measurements around that point. After struggling with it for a bit, here's what I came up with:

#!/usr/bin/perl -wT use CGI::Carp qw/fatalsToBrowser warningsToBrowser/; use GD::Graph::bars; use POSIX qw/strftime/; use strict; $|++; my @wd; open my $w, "weightdata.txt" or die "weightdata.txt: $!\n"; for (<$w>){ chomp; next unless m{^\s*([\d/]+)\s+([\d.]+)}; my($m, $d, $y) = split /\//, $1; push @wd, [ strftime("%s", 0, 0, 0, $d, $m - 1, $y - 1900), $2 ]; } close $w; my ($tmin, $tmax) = ($wd[0][0], $wd[-1][0]); my ($wmax, $wmin) = ($wd[0][1], $wd[-1][1]); my $interval = ($tmax - $tmin) / 10; my($wcalc, $wgone, @data) = $wmax; for (my $n = $tmin; $n <= $tmax; $n += $interval){ if (defined $wd[1][0]){ shift @wd if $n >= $wd[1][0]; } else { last; } push @{$data[0]}, strftime("%m/%d/%Y", localtime($n)); if ($wd[0][0] == $n){ push @{$data[1]}, $wd[0][1]; push @{$data[2]}, 0; } else { my $tdiff = $wd[1][0] - $wd[0][0]; my $wdiff = $wd[0][1] - $wd[1][1]; $wcalc -= $wdiff / $tdiff * $interval; push @{$data[1]}, sprintf("%.2f", $wcalc); push @{$data[2]}, sprintf("%.2f", $wmax - $wcalc); } } push @{$data[0]}, strftime("%m/%d/%Y", localtime($tmax)); push @{$data[1]}, sprintf("%.2f", $wmin); push @{$data[2]}, sprintf("%.2f", $wmax - $wmin); my $graph = GD::Graph::bars->new(800, 400); $graph->set( x_label => 'Date', y_label => 'Weight changes', title => 'My Recent Weight Loss', y_max_value => 350, show_values => 1, overwrite => 1, long_ticks => 1, bar_spacing => 20, transparent => 0, ); $graph->set_legend("Weight", "Lbs lost"); my $img = $graph->plot(\@data); print "Content-type: image/png\n\n"; print $img->png;

There's got to be a smarter way to interpolate that, though. I started fiddling with figuring out the slope (sine) of each time/weight entry, but got lost. :( In general terms, I'd like to know how to plot data like this, arbitrary positions between unevenly-distributed points.

Thanks for any help you can offer!


--
"Language shapes the way we think, and determines what we can think about."
-- B. L. Whorf

Replies are listed 'Best First'.
Re: Interpolating data slope for multiple points
by GrandFather (Saint) on Sep 17, 2010 at 03:34 UTC

    The following seems to do what you want using a simple linear interpolation. The trick is figuring out which two data points to interpolate between. That's what the shift @rawData while ... is doing.

    use strict; use warnings; use POSIX; use constant kIntervals => 10.0; my @rawData; my $tmin; my $tmax; my $secPerDay = 60 * 60 * 24; for (<DATA>) { chomp; next unless m{^\s*([\d/]+)\s+([\d.]+)}; my ($m, $d, $y) = split /\//, $1; my $weight = $2; my $date = POSIX::mktime(0, 0, 0, $d, $m - 1, $y - 1900); $tmin = $date if ! $tmin; $date -= $tmin; $tmax = $date /= $secPerDay; push @rawData, [$date, $weight]; } my ($wmax, $wmin) = ($rawData[0][1], $rawData[-1][1]); my $interval = $tmax / kIntervals; my $wcalc = $wmax; my $wgone; my @data; for my $point (0 .. kIntervals) { my $epoch = $point * $interval; shift @rawData while @rawData > 1 && $rawData[1][0] < $epoch; last if ! @rawData; if (@rawData == 1) { push @data, [$epoch, $rawData[0][1]]; last; } my $m = $rawData[0][1]; my $s = ($rawData[1][1] - $rawData[0][1]) / ($rawData[1][0] - $raw +Data[0][0]); push @data, [$epoch, $m + $s * ($epoch - $rawData[0][0])]; } printf "%d\t%5.1f\n", $_->[0], $_->[1] for @data; __DATA__ 6/26/2010 334 8/12/2010 311.8 8/19/2010 308.4 9/5/2010 300.0 9/9/2010 298.6 9/14/2010 297.2 9/16/2010 293.6

    Prints:

    0 334.0 8 330.1 16 326.3 24 322.4 32 318.5 41 314.6 49 310.7 57 306.7 65 302.7 73 299.0 82 293.6

    I haven't bothered with the graphing code, mostly because that makes it harder to compare results without posting a graph.

    Note that the %s strftime format code is non-standard. I used mktime instead to get epoch seconds then converted that into delta days.

    Update: Fixed interval calculation bug - use $interval instead of kIntervals!

    True laziness is hard work

      {grin} "Simple", he calls it. Thanks, GrandFather - I appreciate the suggestion! I'm going to keep trying, and maybe I'll come up with something a little less complicated yet. It really shouldn't be this difficult...

      As to '%s' being non-standard, point taken. I saw the warning in the POSIX docs, but figured "what the heck". Your approach is definitely more robust.


      --
      "Language shapes the way we think, and determines what we can think about."
      -- B. L. Whorf

        Think about what makes the code difficult for a moment. You have non-uniformly sampled data and you want to fill in the gaps. That means that you have to account for possible special cases at the start and end of the data and you have to ensure you are using the correct points in between. There is a limit to how much you can make that processing simple.

        If you focus on the interpolation loop in my code you will see that:

        1. the start point is handled uniformly with the interpolated points.
        2. raw data is thrown away while the epoch for the current point to be calculated is greater than the second raw data point in the list. That gets us the appropriate pair of points to interpolate between.
        3. the last point is by definition the last point in the raw list and is handled as a special case.

        Where's the problem? {grin}

        True laziness is hard work
Re: Interpolating data slope for multiple points
by Marshall (Canon) on Sep 17, 2010 at 05:41 UTC
    This is a Perl forum, but I generated your graph in less than a minute with Excel. File|Open import data using space delimiter, select data, graph as a line graph (connect each data point with a straight line). I will point out that it is certainly not "cheating" to have Perl drive the Excel program to automate the above! But of course that will take longer than one minute to code. Another technique is to make an Excel macro and have Perl drive that macro command rather than the details of it.

    BTW, your progress is amazing! And very consistent! Great!

    To do this yourself, I would convert date/time to say epoch time. calculate min,max of X and Y scale. Then decide how many "buckets" you will have in each direction. Place each data point to closest X,Y coordinate that your scaling system allows. Calculate intermediate y points along the x axis between 2 points with y=mx+b. A line graph is better here than a bar.

    Your data is so consistent that I don't see any need, but in some cases you would want to make a single line that represents the "best fit" of all the data points. One technique is least squares approximation. There is a way to get Excel and other spreadsheet programs to do that.

    Update:
    Oh, I see that this is some CGI thing. Ok, some code that shows a bit of a different way is below..hope it helps..

    #!/usr/bin/perl -w use strict; use Time::Local; use Data::Dumper; my @data; my @graph; my $one_day = 60*60*24; #one day in seconds while (<DATA>) { next if /^\s*$/; #skip blank lines my ($date, $weight) = split; my $epoch = epoch($date); push (@data, [$epoch, $weight]); } # input data is sorted already sorted # But the algortihm depends upon sorted date information # so I did that to make sure # @data = sort{$a->[0] <=> $b->[0]} @data; # axis x will be adjusted to #days from the date of first data # axis y is in weight my ($x_base_epoch, $y1) = @{shift(@data)}; my $x1_day =0; foreach my $r_xy(@data) { my ($x2, $y2) = @$r_xy; $x2 -= $x_base_epoch; my $x2_day = int($x2/$one_day); # slope is delta weight/ delta days my $slope = ($y2-$y1)/($x2_day-$x1_day); # fill in missing data points... # by linear interpolation for (; $x1_day< $x2_day; $x1_day++) # x interval not tested for # other than 1 { my $y = $slope*($x1_day-$x2_day) + $y2; push @graph, [$x1_day,$y]; } $y1= $y2; } #fix-up for last data point my ($fx,$fy) = @{$data[-1]}; push @graph, [($fx-$x_base_epoch)/$one_day, $fy]; ### data to graph is in @graph ### ### I leave that part to the OP ### print "$_->[0] $_->[1]\n" foreach @graph; sub epoch { my $date = shift; my ($month, $day, $year) = split(m|/|,$date); my $time = timelocal(0,0,0, $day, $month-1, $year-1900); return $time; } =prints 0 334 1 333.527659574468 2 333.055319148936 3 332.582978723404 4 332.110638297872 ..... 74 298.95 75 298.6 76 298.32 77 298.04 78 297.76 79 297.48 80 297.2 81 295.4 82 293.6 =cut __DATA__ 6/26/2010 334 8/12/2010 311.8 8/19/2010 308.4 9/5/2010 300.0 9/9/2010 298.6 9/14/2010 297.2 9/16/2010 293.6
    Update to Update:

    Using a different x axis increment than "one" can get to problematic depending upon what you are trying to do. Often one would want to show the actual data points with some special character and the points inbetween with another character. So if you just want to show say weekly progress, then you have to decide what that would mean in terms of the graphical representation - eg the "weekly Monday point" may not represent any factual point at all. Generating the "weight" for each day is very efficient. If you want something like a "Monday" value it would not be ridiculous to generate all points in the week or year and then just print every 7th value.

      Hints in the first two lines of the OP's sample code imply that leveraging Excel may not be appropriate:

      #!/usr/bin/perl -wT use CGI::Carp qw/fatalsToBrowser warningsToBrowser/;
      True laziness is hard work
        Thanks! Missed that part...I updated my post with another example of code for this job. The OP's statement: I've lost a lot of weight lately, and wanted to plot it on a graph. seemed at first glance to imply something something different than a cgi application that dynamically creates graphs. So as the devil's advocate, this may still very well be a case where update data locally, generate cool image locally and use Perl to automagically upload cool image to website may still be a good way to go.

      {laugh} Excel would be a bit of overkill, but thank you for the suggestion. As far as the progress goes, it hasn't even been all that difficult: I've simply been experimenting with what it's like to feel different degrees of hunger, just being curious about it. I realized that it's been years since I've actually felt hungry (yeah... sounds crazy, I know), and I know that it won't kill me to experience it. As a side benefit, food now tastes absolutely amazing, and I get really full on a dollar's worth of salad greens. It feels fantastic.

      Regarding your code: bravo! That's pretty much what I was looking for - it's very readable, and the algorithm even makes sense to me. I've been playing around with my own code in the meantime, and came up with something more-or-less similar (except I handle the _end_ case as the special one.)

      use POSIX qw/strftime/; use strict; $|++; use constant Intervals => 20; my ($prev_time, $prev_val, $tmin, $tmax, $last_val, @curve); open my $data, "weightdata.txt" or die "weightdata.txt: $!\n"; while (<$data>){ chomp; next unless m{^\s*([\d/]+)\s+([\d.]+)}; my($m, $d, $y) = split /\//, $1; my $time_point = strftime("%s", 0, 0, 0, $d, $m - 1, $y - 1900); if (defined $prev_time){ my $slope = ($2 - $prev_val) / ($time_point - $prev_time); push @curve, [ $prev_time, $time_point, $prev_val, $slope ]; } ($prev_time, $prev_val) = ($time_point, $2); ($tmax, $last_val) = ($time_point, $2); } close $data; $tmin = $curve[0][0]; my $interval = ($tmax - $tmin) / (Intervals - 1); my ($prev_t, $val, $slope); for my $t (0 .. Intervals - 2){ my $calc_t = $t * $interval + $tmin; shift @curve unless $calc_t >= $curve[0][0] && $calc_t <= $curve[0 +][1]; ($prev_t, $val, $slope) = @{$curve[0]}[0, 2, 3]; printf "%2d: Weight on %s was %.1f\n", $t + 1, strftime("%F",local +time($calc_t)), ($calc_t - $prev_t) * $slope + $val; } printf "%2d: Weight on %s was %.1f\n", Intervals, strftime("%F",localt +ime($tmax)), $last_val;

      I've even looked at the intervals carefully - they're consistent - and the change in weight from one plotted point to another (yep, looks right.) I just might be getting the hang of this process. :)

      Thanks very much for your response; I'm definitely finding all this highly educational.


      --
      "Language shapes the way we think, and determines what we can think about."
      -- B. L. Whorf
Re: Interpolating data slope for multiple points
by SuicideJunkie (Vicar) on Sep 17, 2010 at 14:02 UTC

    A little while ago, I needed to find the rotation angle of a flat object in an image, so sampling the edge and finding the line was the way to go. What I eventually went with was:

    use strict; use warnings; use Statistics::LineFit; use Math::Trig; ... my $lineFit = Statistics::LineFit->new(); my @x = map {$_->{col}} @edgeSamples; my @y = map {$_->{row}} @edgeSamples; unless ($lineFit->setData(\@x, \@y)) { print "Invalid data sample for regression; could not calculate + skew for $inFile\n"; next; } ... my ($intercept, $slope) = $lineFit->coefficients(); my $angle = atan($slope); $angle *= 180/3.14159265359; ...

    Now, that does linear interpolation, but I would expect mass loss to be more of a noisy exponential decay towards the final value than linear. If it turns out to actually be a problem you could try resorting to piecewise linear interpolation, or use a fancy high-order plotting package.

      Thanks for your reply. I was actually trying to learn and understand how this process works, but it's good to know there are Magic Black Boxes that can do it. Assuming I can understand the math in the docs, anyway. :)


      --
      "Language shapes the way we think, and determines what we can think about."
      -- B. L. Whorf