in reply to Perl Drag Racing - sum/delta list

We start out:
sub decode2 { my $total = shift; my $delta; my $diff; foreach (@_) { $delta += $_; $diff += $delta; } my $original= ($total-$diff)/(@_+1); my $value= $original; my @results= map { $value += $_ } @_; return ($original, @results); }
We realize that we can simplify the first loop if we simply subtract the first element from $original later:
sub decode2 { my $total = shift; my $diff; $diff += $_ for @_; my $original= ($total-$diff)/(@_+1) - $_[0]; my $value= $original; my @results= map { $value += $_ } @_; return ($original, @results); }
Now we realize we can rearrange the return if we include the $original in the map:
sub decode2 { my $total = shift; my $diff; $diff += $_ for @_; my $original= ($total-$diff)/(@_+1) - $_[0]; my $next; my @results= map { $next += $_ } $original, @_; return @results; }

This also lets us get rid of the intermediary @results array altogether.

Next we realize that we needn't shift off the total, because rather than summing everything into $diff first and subtracting that from $total later, we can simply turn the addition loop into a subtraction loop. For that, we have to flip the sign on the total. Note that not shifting off the total affects many locations in the code.
sub decode2 { my $dc; $_[0] = -$_[0]; $dc -= $_ for @_; # subtract the sum of all elements from the firs +t my $original= $dc/@_ - $_[1]; my $next; return map { $next += $_ } $original, splice @_, 1; }

$dc is an inaccurate name of course.. the DC component for the values is actually $dc/@_ - but oh well.

Some logical cleanup: rather than declaring a new variable and splicing from @_, we can just modify the first value inplace. And thus, a massive cleanup later, we have
sub decode2 { my ($dc, $next); $_[0] = -$_[0]; $dc -= $_ for @_; # subtract the sum of all elements from the firs +t $_[0] = $dc/@_ - $_[1]; return map { $next += $_ } @_; }
____________
Makeshifts last the longest.

Replies are listed 'Best First'.
Re: Re: Perl Drag Racing - sum/delta list
by BrowserUk (Patriarch) on Aug 15, 2002 at 06:33 UTC

    Aristotle++

    It is so instructive to see a worked example of golfing, especially on something small enough to be able to follow and big enough to allow for some real transformations.

    However, I still don't follow the logic of totalling stuff up and then having to break it down in order to retrieve the first value before applying the deltas, instead of just retaining the first value itself?

    Did I miss something?


    What's this about a "crooked mitre"? I'm good at woodwork!
      Sorry, I don't understand what you're referring to. Could you quote a specific part of the post/code?

      Makeshifts last the longest.

        No biggy Aristotle, It was late and I was tired, but I didn't understand why John M. Dlugosz was calculating his very complicated 'first value', thereby ensuring that he had a overly complicated piece of processesing in order to decode his stream.

        As you had done such a thorough job of golfing his code, I thought maybe you saw the reason that I was missing.

        I see now that you just enjoyed the golf, and didn't bother yourself with the architecture of the course:)


        What's this about a "crooked mitre"? I'm good at woodwork!
Re: Re: Perl Drag Racing - sum/delta list
by John M. Dlugosz (Monsignor) on Aug 16, 2002 at 15:31 UTC
    Hmm, cool, but it doesn't check. I put in (8636 460 17372 242 ...) and got out (28501.5412844037 20325.5412844037 37237.5412844037 20107.5412844037 ...)

    Here is my complete program. Since it failed on a longer test vector, it might be an overflow or something?

Results?
by John M. Dlugosz (Monsignor) on Aug 16, 2002 at 16:01 UTC
    On your first change, you appear to be using $diff with the same meaning. But, subtracting out $original later doesn't do the same thing as calculating the running $delta. I suppose it's a quirk that the example gave the same results!!

    If the total is a+b+c+d, then the given data is (total, b-a, c-b, d-c). This means that the missing "original" a is total- (a+(b-a) + (a+(b-a))+(c-b) + ((a+(b-a))+(c-b))+(d-c))) divided by the number of elements.

    Hmm, cancelling out, I see (b + c + d), but you are computing ( (b-a) + (c-b) + (d-c) ), the sum of the listed deltas. That gives you (d-a) which is the unknown first value (a) subtracted from the last value, which we don't know either.

    It is a total coincedecne that the sum of the deltas happens to be the same as the difference between the first and last result elements.

      You are right, I didn't do the math on paper originally, and due to false laziness, lack of an existing encode() or motivation to write one made me not verify my code with different data, even though I knew that might come back and bite me in the tender parts.

      I fixed the code; this time, no comments, since I don't know how to explain it briefly. I guess it warrants a longer comment in front of the subroutine containing the math.

      sub decode3 { my ($diff, $delta, $next); $diff = $_[0]; $delta = -$_[0]; $diff -= $delta += $_ for @_; $_[0] = $diff/@_; return map { $next += $_ } @_; }

      Your tests verify this one, and the benchmark on my 5.6.1 says it is consistently about 42% faster than the for version and 35% faster than the map one. The main win was removing the intermediary @results. If I put that step back in, it only beats the vanilla versions by 12% and 8%, respectively, which means you can speed up your own versions by 25% if you omit the temporary array.

      The rest of the speed difference is mainly because I avoid a couple off-by-one corrections by not shifting off the total. It is debatable whether the loss of clarity (or the requirement for comments..) is worth the speed gains. Some profiling would be in order to determine the impact of decode()'s speed on overall program perforamce.

      Makeshifts last the longest.

        I think it's avoiding those "off by one" corrections that helps. Because of the interpretive nature, ($x+1) has a certain amount of overhead, compared to the machine INCrement instruction of a register. But "big" instructions like map and foreach have spread the overhead over a lot of work, so it is not noticable.

        I conjecture that map is good because it knows that the output list will be the same size as the input list, and can set things up properly ahead of time, as opposed to push'ing each element as a separate instruction

        Very clever how you initialize $delta to avoid the shift and "+1" corrections. Many thanks for illustrating your techniques.

        —John