Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Largest Sum of Consecutive Integers

by OverlordQ (Hermit)
on Aug 30, 2006 at 16:46 UTC ( [id://570420]=perlquestion: print w/replies, xml ) Need Help??

OverlordQ has asked for the wisdom of the Perl Monks concerning the following question:

Greetings monks!

A problem has come to my attention. Given an arbitrary sized array of integers in an arbitrary order, whats the sequence of consecutive integers that results in the largest sum.*

Given the array:
array = {3 2 8 9 -25 5 8 4 4 -3 5 3 -10}
The subset with the largest sum would be:
largestsubset = {5 8 4 4 -3 5 3}
This is what I came up with:
#!/usr/bin/perl use strict; use warnings; my @array = qw( 3 2 8 9 -25 5 8 4 4 -3 5 3 -10 ); my $length = scalar(@array); my $sidx; my $eidx; my $max = 0; my $start = 0; while($start < $length) { my $sum = $array[$start]; for(my $end = ($start + 1); $end < $length; $end++) { $sum += $array[$end]; if($sum > $max) { $sidx = $start; $eidx = $end; $max = $sum; } } $start++; }
Which gives you the nice and handy:
Start: 5 End: 11

Now of course that isn't pretty, but it works nicely for smaller sets. Only problem I could see is if the sets grow to an unreasonable size. Are there any more elegant ways of going at this?

One of my ideas was watching the value of sum and if it reached a negative value then break out of the inner loop.




* Yes, this is a homework question if you've guessed, but I already have a viable solution, and am just interested in more efficient algorithms, and not to mention we have to use Java.

Replies are listed 'Best First'.
Re: Largest Sum of Consecutive Integers
by tilly (Archbishop) on Aug 31, 2006 at 00:19 UTC
    I simply can't believe how complicated people's attempts at an O(n) solution are. It is as simple as this:
    #! /usr/bin/perl -w use strict; my @array = @ARGV ? @ARGV : qw(3 2 8 9 -25 5 8 4 4 -3 5 3 -10); my $cur = my $best = my $best_start = my $cur_start = 0; my $best_end = -1; for (0..$#array) { $cur += $array[$_]; if ($cur < 0) { $cur = 0; $cur_start = $_ + 1; } elsif ($best < $cur) { $best = $cur; $best_start = $cur_start; $best_end = $_; } } print qq(Start: $best_start End: $best_end Sum: $best Numbers: @array[$best_start..$best_end] );
    Update: I was asked in chatter about the -1. Yes, it has a purpose, and were I the instructor I would ding the code for not providing a comment explaining it.
      I was asked in chatter to explain why this works. Here is the explanation.

      The sets of consecutive integers are made up of the empty set, and ranges @array[$x..$y] with $x < $y. This algorithm examines a set of sums from sets of consecutive integes and selects the best of that collection. What we need to demonstrate is that we looked at one that was best. I'll actually show that none of the sets we skipped were the longest best one. (We may not choose the longest best one, but if we can show we looked at it, then we'll definitely got the best possible score.) Or, equivalently, that every set we skipped either has a lower sum or a shorter length than some other set.

      We look at the empty set, so all of the sets we skip look like @array[$x..$y]. We have to show that any we skip is not the longest best. Let us divide this into cases:

      1. We did not set $cur_start to $x. Then when $_ was $x-1, $cur was greater than or equal to 0. Which means that there is some $z < $x such that sum(@array[$z..($x-1)]) >= 0. But then sum(@array[$z..$y]) >= sum(@array[$x..$y]) and @array[$x..$y] is not the longest best because there is a longer one that is at least as good.
      2. We did set $cur_start to $x. Then there is some $z with $x <= $z <= $y with sum(@array[$x..$z]) < 0. Now we have two more cases:
        1. $z = $y In this case sum(@array[$x..$y]) < 0 and we are beaten by the empty set.
        2. $z < $y In this case sum(@array[$x..$y]) < sum(@array[$z..$y]) and we are not best.
      And there you have it. If we did not look at a set of consecutive integers, it is not best. Since we took the max of the rest, we must have found the best one.

      The proof for the slight variation of the algorithm that I presented excluding the empty set is similar but simpler. I leave it as an exercise to go through that proof and see why I had to make the code changes that I did.

      Update: I had a slight oversight the first time around. I fixed that by replacing "best" with "longest best".

      Update: The error is my environment not tilly's code!

      With -s command line switch disabled, his code produces the following output for the 'failing test':

      c:\test>perl tilly.pl -1 -2 -3 -4 -5 -6 -7 -8 -9 Start: 0 End: -1 Sum: 0 Numbers:

      which is not totally incorrect.

      Your's needs to be a little more complex though.

      c:\test>tilly -1 -2 -3 -4 -5 -6 -7 -8 -9 Start: 5 End: 11 Sum: 26 Numbers: 5 8 4 4 -3 5 3
      <p

      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      As for the interesting case of all negative numbers noted by BrowserUK++, coming from the initial "best" intervall being an empty list with sum zero, which will then serve as an upper limit not to surpass in the loop.

      I think that can be dealt with by

      Update: hiding original approach in readmore tags, since it is incorrect and now obsolete

      (start Update insert)

      So instead of messing that much with tilly's code I now concentrate on the "best" initialization before we enter the loop.

      For my addendum loops through the beginning of @array as long as we are still seeing negative numbers.

      When we finally see a non-negative we proceed with tilly's code from there.

      To be prepared for the case that all elements are negative, we adjust the *_start, *_end, best*, cur* variables heading for the max (= least negative number) instead.

      Which leaves me with some code not so elegant anymore
      Maybe someone else is able to reimplement tilly's original idea also to work for all negative numbers in a more elegant/compact/concise way

      (end update insert)

      Otherwise I think the idea behind the algorithm presented by tilly++ is correct, or so I hope.

      Update: Still not correct as presented. Back to the drawing board...

        Speaking mathematically, the sum of the empty set is a perfectly valid sum of consecutive integers, and its value is 0. But you can change the problem to exclude the empty sum as follows:
        #! /usr/bin/perl -w use strict; my @array = @ARGV ? @ARGV : qw(3 2 8 9 -25 5 8 4 4 -3 5 3 -10); my $cur = my $best_start = my $best_end = my $cur_start = 0; my $best = $array[0]; for (0..$#array) { $cur += $array[$_]; if ($best < $cur) { $best = $cur; $best_start = $cur_start; $best_end = $_; } if ($cur < 0) { $cur = 0; $cur_start = $_ + 1; } } print qq(Start: $best_start End: $best_end Sum: $best Numbers: @array[$best_start..$best_end] );

      Ah, just a matter of finding the right abstraction / realization that allows you to see why each algorithm always finds the right answer. For this one, a simple two-part realization does it:

      (----- the full list -------) [-- max sum --] <neg><pos><- plus ->

      If a subrange has the maximum sum, then splitting that range into two new subranges will always give you two ranges that each have non-negative sums (otherwise the other new subrange would be an even better choice). In particular, if "<pos>" has a negative sum, then "[-- max sum --]" does not have the maximum sum.

      Conversely, any non-empty subrange directly to the left ("<neg>") of the max-sum subrange will have a negative sum (else adding that in would give a better sum).

      - tye        

        Another observation is that you can always add the consecutive postive or negative numbers together, so you only need to deal with the case when the sequence is alternating, +-+-+-+-...+. That may simplify the logic. But tilly's solution is certainly elegant, the only "drawback" is that it's not symmetric, would be more beautiful if it went from both ends.
      That's pretty close to what I was thinking, except I couldn't figure out how to get rid of that inside for loop since I missed the obvious way of getting the sum like in your example.
Re: Largest Sum of Consecutive Integers
by blokhead (Monsignor) on Aug 30, 2006 at 18:53 UTC
    Here's a linear-time algorithm that uses a much different approach than what tye has outlined.. It's probably needlessly dense, but I'm sorry, it came out of my brain in such a form.

    If you want to find the best interval within array[i..j], then there are 3 possibilities.

    • The best interval is completely contained within the left half of array[i..j]
    • The best interval is completely contained within the right half of array[i..j].
    • The best interval includes the midpoint of i..j (so it is made up of the best interval in the left half that includes the midpoint, and the best interval in the right half that includes the midpoint).
    I'll define a couple of thingies here: OK, so I like to write out dynamic programming algorithms in a logic programming paradigm:
    • best(i,j) is the score of the best interval within array[i..j]
    • lbest(i,j) is the score of the best interval within array[i..j] that includes the left endpoint (i)
    • rbest(i,j) is the score of the best interval within array[i..j] that includes the right endpoint (j)
    • total(i,j) is the sum of the elements in array[i..j].
    Here are the rules for the dynamic programming:
    lbest(i,i) = rbest(i,i) = best(i,i) = max{ 0, array[i] } best(i,j) = max{ rbest(i,k) + lbest(k+1,j), best(i,k), best(k+1,j) }, where k = int((j+i)/2) lbest(i,j) = max{ total(i,k) + lbest(k+1,j), lbest(i,k) }, where k = int((j+i)/2) rbest(i,j) = max{ rbest(i,k) + total(k+1,j), rbest(k+1,j) }, where k = int((j+i)/2)
    We can compute total(i,j) without much overhead by keeping track of an accumulation array (sums of the form array[0..j]). This takes O(n) precomputation for the accumulation sums, and we can compute total(i,j) = sum(array[0..j]) - sum(array[0..i-1]) in constant time.

    To compute each lbest(i,j), we split the interval into two intervals of half the size and recurse (with constant computation at this level). So to precompute all the lbest() and rbest() values that we need takes linear time total. Then with all of the lbest() and rbest() precomputed, to compute best(i,j) also just takes two recursive calls for intervals half the size. So the whole algorithm is linear.

    OK, so this is probably a lot more machinery than tye suggests, but it gets the job done. I was formulating this before I saw his reply anyway. I wrote some example code for this, but I doubt anyone cares..

    There's also the usual pain associated with modifying an algorithm that computes score of the best interval (which this does) to make it return the actual interval itself. ;)

    blokhead

Re: Largest Sum of Consecutive Integers
by Limbic~Region (Chancellor) on Aug 30, 2006 at 17:50 UTC
    OverlordQ,
    When you discussed this in the CB earlier, I was too busy thinking of my own solution to listen to what tye and ambrus had to say. I believe they felt an O(N) solution was possible. I believe this is an improvement on your code but doubt it is as efficient as what they had in mind.
    #!/usr/bin/perl use strict; use warnings; use List::Util 'sum'; my @set = (3, 2, 8, 9, -25, 5, 8, 4, 4, -3, 5, 3, -10); pop @set while $set[-1] < 0; shift @set while $set[0] < 0; my @neg = map {$set[$_] < 0 ? $_ : () } 0 .. $#set; my @beg = (0, map {$_ + 1} @neg); my @end = ($#set, map {$_ - 1} @neg); my ($beg, $end, $max, $sum) = (0) x 4; for my $i (@beg) { for my $j (@end) { next if $j < $i; $sum = sum(@set[$i .. $j]); ($beg, $end, $max) = ($i, $j, $sum) if $sum > $max; } } # Adjust accordingly if leading/trailing negatives print "$beg .. $end = $max\n";
    Since this is a homework assignment, I have not provided comments, but I will do so in the future for others that may come across this thread. Improvements are possible but hopefully different algorithms will be presented by others.

    Cheers - L~R

Re: Largest Sum of Consecutive Integers (ends)
by tye (Sage) on Aug 30, 2006 at 18:06 UTC

    The sum of a contiguous subset is the total sum minus the sum of the left-side items that are left out and minus the sum of the right-side items that are left out. So instead of looking at O(N*N) combinations you can look at O(N) left-side possibilities and O(N) right-side possibilities and get the best answer. If an end-sum is not negative, then there is no benefit in omitting those items so only consider removing an end-run of items if it has a negative sum and the lowest sum of the end-runs.

    But then you've got to deal with the special cases when this doesn't work. They aren't terribly tough to figure out but they certainly complicate the code. The trick is realizing these cases, but you'll probably hit them if you consider lists where most (or all) of the numbers are negative.

    Update: But I haven't proven this to be correct, so there may be cases I haven't considered.

    - tye        

      A "fun" challenge would be to come up with a dataset where my above method won't work. I think such exist, but I don't see clearly how to construct one (probably a lot of alternating positive and negative numbers).

      Dealing with the sum of left- and right-side end-runs certainly makes the problem easier to think about. I think I've got a much simpler-to-implement algorithm that will work on any set now.

      I'd construct the list of left-side end-run sums and the list of right-side end-run sums and track the "most negative so far" as you build it. Then you can just iterate one list and look up in the other list and select the maximum sum in O(N).

      If it weren't homework and/or I wasn't busy doing stuff that I get paid for, then I'd probably write up the code to do that. :)

      - tye        

        I just gotta figure out what the heck you just said ;)
Re: Largest Sum of Consecutive Integers
by zigdon (Deacon) on Aug 30, 2006 at 17:54 UTC
    The solution you described is O(@array ** 2) - it's complexity is proportional to the square of the number of items in the array. I think it can be done in O(@array) instead - but perhaps I have a mistake in my logic? Sample code follows:

    Basically, it's similar to what you added at the end - go over the array, adding up the items until you find one that just cancels them out. Then start over from that point. Does anyone spot a mistake here?

    -- zigdon

Re: Largest Sum of Consecutive Integers
by MidLifeXis (Monsignor) on Aug 30, 2006 at 18:23 UTC

    One of my ideas was watching the value of sum and if it reached a negative value then break out of the inner loop.
    Until you get something like this.....
    my @array = qw( -25 -3 -10 -30 -1 -2 -1 -4 -23 );
    (I used to write test cases for my colleges CS club's High School programming contest).

    --MidLifeXis

Re: Largest Sum of Consecutive Integers
by wojtyk (Friar) on Aug 30, 2006 at 19:10 UTC
    Are you sure that's not O(n*log(n))?
    It looks like your basic divide-and-conquer

    Speaking of which, that's actually how I envisioned solving this, only utilizing medians (aka, divide list, find median, then recompute new list indexes based on the median values)

Re: Largest Sum of Consecutive Integers
by madbombX (Hermit) on Aug 30, 2006 at 17:44 UTC
    Have you considered using Set::Array? There are a few functions in there for manipulating arrays that might be of use to you.

    Eric

      the largest sum is always going to be: add up all positive integers in the array and exclude negatives. as adding is communicative, order of the array and sequence of the add operation doesn't make any difference.
      the hardest line to type correctly is: stty erase ^H
        the largest sum is always going to be: add up all positive integers in the array and exclude negatives. as adding is communicative, order of the array and sequence of the add operation doesn't make any difference.
        While this certainly gives the largest sum of a sub(multi)set of the entries of the array, it needn't satisfy the conditions of the problem, since that required the summands to be consecutive terms of the array. (Incidentally, addition is commutative.)
Re: Largest Sum of Consecutive Integers
by rir (Vicar) on Aug 30, 2006 at 18:37 UTC
    You may have a viable solution but it is buggy: the my $max = 0; is an obvious error because the problem is uninteresting without negative numbers.

    Be well,
    rir

Re: Largest Sum of Consecutive Integers
by artist (Parson) on Aug 31, 2006 at 12:13 UTC
    Though, very interesting problem, I am interested in knowing what is the purpose or practical application of such problem.
    --Artist

      Optimization. If you can choose to do a number of tasks, each with a certain profit, and you can choose which point you start and end at, find the start and stop points that will optimize your profit.

      --MidLifeXis

Re: Largest Sum of Consecutive Integers
by Anonymous Monk on Oct 13, 2008 at 19:30 UTC
    Here is your original set

    3 2 8 9 -25 5 8 4 4 -3 5 3 -10

    First step is adding up consequitive positives and consequitive negatives. The reason for adding up consequitive positives is crystal clear. The only reason, you may pick a negative at any time is it may bring a positive value in future if we keep on going to right. Therefore, to go right and reach a positive value, you must pass through all negatives in the sequence which are in a row. I've made this simplification to see the big picture
    Here is the set after this process

    22 -25 21 -3 8 -10

    At this point, I will start from the end of list, and go to the beginning of list while keeping LOCAL BENEFIT and CURRENT MAX. If the end of list is negative, I will skip it and go left. Otherwise, I will assume it is current maximum that I can reach.

    At this point, 8 is assumed to be current maximum, and local benefit is 0.

    At the next step, I will update local benefit by adding 8, and -3, and assign it +5. Then my current position will be 21 with this local benefit +5. The current position will take the local benefit if it is positive, or drop it if it is negative or 0. If the current position accepts the local benefit, it will add itself with the local benefit, so 21 + 5 =26 which is now bigger than current maximum, so current maximum will start from the position of 21. If the local benefit is 0 or negative, it will be assigned to 0, and re-calculated from the current position, and its negative value neighbor on its left. If the local benefit is positive, the difference between current position ,and negative neighbor on left will be added up to local benefit. If you follow the procedure above, and reach the last positive on the left, you have done the complete assetment.

    Here is the algorithm applied to data set.

    22 -25 21 -3 8 -10

    -10 droped firstly from the end.

    current pos : 8 - current max : 8 - local benefit : 0

    STEP 2:

    current pos : 21 - current max : 8 - local benefit : 5

    21 + 5 = 26 > 8, so update current max

    current pos : 21 - current max : 26

    STEP 3:

    current pos : 22-current max : 26-local benefit : -25+26 =1

    22 + 1 = 23 < 26

    so current max doesn't change, and local benefit becomes 23 since 0 + 22 + 1 .

    Here ( 0 + 22 ) is the value adding up to local benefit. However, since there is no value on left, it becomes 0 instead oa negative neighbor.

    The algorithm complexity is linear. The first step is for reduction, and takes linear space and time. However, it is there just reduce the problem size.

    The real solution is also linear since you travse from the end of list to the begining while keeping contant extra storage space.

    Ok guys, there is the outline of my algorithm

      Hi Every One,

      I have improved my algorithm going from back to forward, but this time it also goes from left to right.

      You keep a current max, and current sum

      3 2 8 9 -25 5 8 4 4 -3 5 3 -10

      current sum 3 5 13 22 0 5 13 17 21 18 23 26 16

      current max 3 5 13 22 22 22 22 22 22 22 23 26 26

      When I reach -25. I update current sum and it is negtiave, so I start the current sum from here again from scratch.

      This method can even accept infinite numbers coming from an outside source, so it is on the fly algorithm which uses greedy methods

      Ilteris Murat Derici & David Matula

Re: Largest Sum of Consecutive Integers
by Anonymous Monk on Oct 13, 2008 at 20:27 UTC
    Hi Every One,

    I have improved my algorithm going from back to forward, but this time it also goes from left to right.

    You keep a current max, and current sum

    3 2 8 9 -25 5 8 4 4 -3 5 3 -10

    current sum 3 5 13 22 0 5 13 17 21 18 23 26 16

    current max 3 5 13 22 22 22 22 22 22 22 23 26 26



    When I reach -25. I update current sum and it is negtiave, so I start the current sum from here again from scratch.

    This method can even accept infinite numbers coming from an outside source, so it is on the fly algorithm which uses greedy methods

    Ilteris Murat Derici & David Matula

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://570420]
Approved by Limbic~Region
Front-paged by Limbic~Region
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (5)
As of 2024-03-29 09:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found