John M. Dlugosz has asked for the wisdom of the Perl Monks concerning the following question:

Suppose you have a list of numbers, presumably closely grouped. Instead of listing the numbers, you can list the total followed by the delta between each item and the previous (no delta listed for the first item).

It's a slight bit of algebra to figure out how to get the original list back, and I illustrated it with a Perl function. I used a couple foreach loops rather than map, because I wanted to make the math clear to non-Perl programmers.

Then I benchmarked it with a map instead, and found about 8% speedup.

So, how else might this function be improved, in terms of speed? I don't think it's possible without making two passes through the data. But, just like someone who doesn't know how to map is missing something, I wonder if there are any other techniques I can employ here.

sub decode2 { my $total= shift; my $delta= 0; my $diff= 0; foreach (@_) { $delta += $_; $diff += $delta; } my $original= ($total-$diff)/(1+scalar @_); my $value= $original; my @results= map { $value += $_; } @_; return ($original, @results); }
Test vector: input: 5028 1 -21 44 -1
output: 1000 1001 980 1024 1023

Replies are listed 'Best First'.
Re: Perl Drag Racing - sum/delta list
by Aristotle (Chancellor) on Aug 15, 2002 at 00:08 UTC
    We start out:
    sub decode2 { my $total = shift; my $delta; my $diff; foreach (@_) { $delta += $_; $diff += $delta; } my $original= ($total-$diff)/(@_+1); my $value= $original; my @results= map { $value += $_ } @_; return ($original, @results); }
    We realize that we can simplify the first loop if we simply subtract the first element from $original later:
    sub decode2 { my $total = shift; my $diff; $diff += $_ for @_; my $original= ($total-$diff)/(@_+1) - $_[0]; my $value= $original; my @results= map { $value += $_ } @_; return ($original, @results); }
    Now we realize we can rearrange the return if we include the $original in the map:
    sub decode2 { my $total = shift; my $diff; $diff += $_ for @_; my $original= ($total-$diff)/(@_+1) - $_[0]; my $next; my @results= map { $next += $_ } $original, @_; return @results; }

    This also lets us get rid of the intermediary @results array altogether.

    Next we realize that we needn't shift off the total, because rather than summing everything into $diff first and subtracting that from $total later, we can simply turn the addition loop into a subtraction loop. For that, we have to flip the sign on the total. Note that not shifting off the total affects many locations in the code.
    sub decode2 { my $dc; $_[0] = -$_[0]; $dc -= $_ for @_; # subtract the sum of all elements from the firs +t my $original= $dc/@_ - $_[1]; my $next; return map { $next += $_ } $original, splice @_, 1; }

    $dc is an inaccurate name of course.. the DC component for the values is actually $dc/@_ - but oh well.

    Some logical cleanup: rather than declaring a new variable and splicing from @_, we can just modify the first value inplace. And thus, a massive cleanup later, we have
    sub decode2 { my ($dc, $next); $_[0] = -$_[0]; $dc -= $_ for @_; # subtract the sum of all elements from the firs +t $_[0] = $dc/@_ - $_[1]; return map { $next += $_ } @_; }
    ____________
    Makeshifts last the longest.

      Aristotle++

      It is so instructive to see a worked example of golfing, especially on something small enough to be able to follow and big enough to allow for some real transformations.

      However, I still don't follow the logic of totalling stuff up and then having to break it down in order to retrieve the first value before applying the deltas, instead of just retaining the first value itself?

      Did I miss something?


      What's this about a "crooked mitre"? I'm good at woodwork!
        Sorry, I don't understand what you're referring to. Could you quote a specific part of the post/code?

        Makeshifts last the longest.

      Hmm, cool, but it doesn't check. I put in (8636 460 17372 242 ...) and got out (28501.5412844037 20325.5412844037 37237.5412844037 20107.5412844037 ...)

      Here is my complete program. Since it failed on a longer test vector, it might be an overflow or something?

      On your first change, you appear to be using $diff with the same meaning. But, subtracting out $original later doesn't do the same thing as calculating the running $delta. I suppose it's a quirk that the example gave the same results!!

      If the total is a+b+c+d, then the given data is (total, b-a, c-b, d-c). This means that the missing "original" a is total- (a+(b-a) + (a+(b-a))+(c-b) + ((a+(b-a))+(c-b))+(d-c))) divided by the number of elements.

      Hmm, cancelling out, I see (b + c + d), but you are computing ( (b-a) + (c-b) + (d-c) ), the sum of the listed deltas. That gives you (d-a) which is the unknown first value (a) subtracted from the last value, which we don't know either.

      It is a total coincedecne that the sum of the deltas happens to be the same as the difference between the first and last result elements.

        You are right, I didn't do the math on paper originally, and due to false laziness, lack of an existing encode() or motivation to write one made me not verify my code with different data, even though I knew that might come back and bite me in the tender parts.

        I fixed the code; this time, no comments, since I don't know how to explain it briefly. I guess it warrants a longer comment in front of the subroutine containing the math.

        sub decode3 { my ($diff, $delta, $next); $diff = $_[0]; $delta = -$_[0]; $diff -= $delta += $_ for @_; $_[0] = $diff/@_; return map { $next += $_ } @_; }

        Your tests verify this one, and the benchmark on my 5.6.1 says it is consistently about 42% faster than the for version and 35% faster than the map one. The main win was removing the intermediary @results. If I put that step back in, it only beats the vanilla versions by 12% and 8%, respectively, which means you can speed up your own versions by 25% if you omit the temporary array.

        The rest of the speed difference is mainly because I avoid a couple off-by-one corrections by not shifting off the total. It is debatable whether the loss of clarity (or the requirement for comments..) is worth the speed gains. Some profiling would be in order to determine the impact of decode()'s speed on overall program perforamce.

        Makeshifts last the longest.

Re: Perl Drag Racing - sum/delta list
by BrowserUk (Patriarch) on Aug 14, 2002 at 22:53 UTC

    I'm probably missing something but it seems to me you are making life hard by calculating that running total or whatever it is?

    Why not just store the original first value and the deltas?

    #! perl -w use strict; sub decode { my $val; return map { $val += $_ } @_; } my @input = qw( 1000 1 -21 44 -1 ); my @output = decode @input; print "@output", $/; __END__ C:\test>190239 1000 1001 980 1024 1023

    What's this about a "crooked mitre"? I'm good at woodwork!
      Why? It's for a compression archive format. I'm renovating my design and have the following (re "solid packing of small files"):

      The total size of a packet is already known. (thus the total size). If this is a concatenation of subpackets, we need to know where to separate them back out. Instead of putting the length of the subpacket before each, it offers better compression to put this "index" data elsewhere and not "contaminate" the data with values that are not in its normal range. Also, there is an elegance factor of uniformity about putting this index in its own field, not mixed throughout.

      So, how to store the index? The delta-from-previous is smaller than a simple list of lengths, because it can use a shorter encoding form if the sizes are similar. But since the total size is already known, we only need the deltas of all except one, and can leave out one value.

      —John

Re: Perl Drag Racing - sum/delta list
by Abigail-II (Bishop) on Aug 15, 2002 at 10:02 UTC
    While in some cases there's an advantage of storing the increments instead of the values (it can save space - a technique also used in sampling (but there's not much to gain in Perl, except when your numbers would have been BigInts and your increments aren't)), I don't see the advantage of your complicated first value. Why not add an imaginary 0 before the orginal list and store just the differences? (That means, the first value is the first value). The you only need one pass in both the decoding and encoding.

    Abigail

      See my reply to BrowserUk, who asked a similar question.

      Basically, the whole idea is to save space; the more the better (by definition). The numbers, even though "normal" sized, are like BigInts variable-sized in their encoded form, and 0 and other small values have the minimum 1-byte length.

      Since I already know the total size, I can remove one value from the delta list and still figure it out again.

      So the answer to why is "because I can". The requirement is for minimum space and demonstratable (not optimal) logic to deal with it.

      Thanks,
      —John