http://qs1969.pair.com?node_id=180276

Ovid has asked for the wisdom of the Perl Monks concerning the following question:

I've wound up with a bit of a tricky and ugly code that I need to make more flexible. The problem: We have a client who displays their products on our Web site. These products are divided into several categories that are displayed in multiple columns. Here are the rules:

For example, let's say I have categories one through six with these respective heights (number of products):

my @height = qw/ 10 15 25 30 10 13 /;

With that, I might have those categories distributed as follows (pretending that the index starts with 1 instead of zero):

  Column 1 Column 2 Column 3 Column 4
Categories: 1 3 4 5
Categories: 2     6
Height: 25 25 30 23

The following code will accurately determine that, noting that I need to sort the actual values in each column. Further, it takes into account that the 1st and 6th category will always be in columns 1 and 4, respectively. Each for loop is iterating over a subsequent category with the indices being the only allowable values for a category (the second category is $a and cannot possibly be in column 3 because that would leave column 2 empty).

#!/usr/bin/perl -w use strict; use Data::Dumper; my @height = qw/ 10 15 25 30 10 13 /; my @columns = ($height[0],0,0,$height[-1]); my $curr_height = 0; # set this unreasonably high to ensure that subsequent # heights will be lower $curr_height += $_ foreach @height; my @distribution = ([1],[],[],[6]); for my $a (0..1) { $columns[$a] += $height[1]; for my $b (0..2) { $columns[$b] += $height[2]; for my $c (1..3) { $columns[$c] += $height[3]; for my $d (2..3) { $columns[$d] += $height[4]; my $this_height = ( sort @columns )[-1]; my $valid_dist; foreach ( @columns ) { $valid_dist = $_; last if ! $valid_dist; } if ( $valid_dist and $this_height < $curr_height ) { $curr_height = $this_height; push @{$distribution[$a]}, 2; push @{$distribution[$b]}, 3; push @{$distribution[$c]}, 4; push @{$distribution[$d]}, 5; } $columns[$d] -= $height[4]; } $columns[$c] -= $height[3]; } $columns[$b] -= $height[2]; } $columns[$a] -= $height[1]; } print Dumper \@distribution;

That code results in 121 iterations. Some possibilities are skipped by last if ! $valid_dist because the for loops that I have written sometimes allow for invalid combinations (if the 5th category ($d) is in the fourth column, the 4th category cannot be in the second column because that would leave the third column empty).

My real problem: the client dictates changes to the Web site faster than a jackrabbit ... uh, never mind ... you get the idea.

Essentially, this used to be two columns. It's now four. It may turn into three columns. We may have another category added or removed. My code snippet works, but if the client changes his mind again, it would be nice to simply have the code work. Any suggestions?

Cheers,
Ovid

Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

Replies are listed 'Best First'.
Re: Puzzle: need a more general algorithm
by japhy (Canon) on Jul 08, 2002 at 19:59 UTC
    Here's the algorithm:
    use constant COL => 4; my @data = qw( 10 15 25 30 10 13 ); my @cols = map [$_], @data; while (@data > COL) { my $i = 0; my $s = $data[$i] + $data[$i+1]; for my $j (1 .. @data-2) { ($i, $s) = ($j, $data[$j] + $data[$j+1]) if $data[$j] + $data[$j+1] < $s; } splice @data, $i, 2, $s; splice @cols, $i, 2, [@{ $cols[$i] }, @{ $cols[$i+1] }]; }

    _____________________________________________________
    Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
    s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

      Greediness does not work in general. What if your categories have sizes 15, 15, 10, 10, 15, 15? Your solution comes out with a suboptimal answer.
          Greediness does not work in general. What if your categories have sizes 15, 15, 10, 10, 15, 15? Your solution comes out with a suboptimal answer.

        That's what backtracking's for. :-)

        Actually, that brings up an interesting question: Ovid, are there any bounds on how optimal the answer has to be? Exactly optimal (sucky if this problem turns out to be NP-Hard)? Within a constant factor of optimal (like, max length no more than 1.5x larger than optimal)?

        --
        The hell with paco, vote for Erudil!
        :wq

•Re: Puzzle: need a more general algorithm
by merlyn (Sage) on Jul 08, 2002 at 20:17 UTC
    Take a look at Knuth's TeX language. This is pretty close to the "line length" issue that gets solved for every paragraph set by his typesetting system. And if anyone has a good general algorithm for doing this, it's got to be Knuth!

    -- Randal L. Schwartz, Perl hacker

•Re: Puzzle: need a more general algorithm
by merlyn (Sage) on Jul 08, 2002 at 20:35 UTC
    One other thing to notice is that a particular distribution can be identified by a binary string of length equal to one less than the number of categories, and a number of 1-bits equal to one less than the number of columns. Each 0-bit denotes that the next category is in the same column as the prior category, while a 1-bit denotes that the next category begins the next column over. Since the first category is forced into the first column, and the last category is forced into the last column, we get two freebies there.

    So the total number of distributions of N categories into M columns is equal to the number of combinations of N-1 things taken M-1 at a time.

    Dunno if this helps, but it should keep you from brute forcing more than you need. {grin}

    In fact, for your particular dataset (6 categories, 4 columns), you shouldn't need to brute force more than (5 items taken 3 at a time which is) 10 tries.

    Wow, that's less than I thought! But it desk checks properly. All you need is a good generating algorithm, and you can brute force this!

    -- Randal L. Schwartz, Perl hacker

      merlyn wrote: All you need is a good generating algorithm and you can brute force this!

      I thought this idea was so intensely cool that I just had to try it out. However, coming up with a "good generating alorithm" escapes me. First, I took the list of possible permutations that dws created and translated it:

      [1][2][3][4,5,6] 1 1 1 0 0 [1][2][3,4][5,6] 1 1 0 1 0 [1][2][3,4,5][6] 1 1 0 0 1 [1][2,3][4][5,6] 1 0 1 1 0 [1][2,3][4,5][6] 1 0 1 0 1 [1][2,3,4][5][6] 1 0 0 1 1 [1,2][3][4][5,6] 0 1 1 1 0 [1,2][3][4,5][6] 0 1 1 0 1 [1,2][3,4][5][6] 0 1 0 1 1 [1,2,3][4][5][6] 0 0 1 1 1

      Then, once I was sure I understood it, I went ahead and hardcoded that so I could manipulate it and look for patterns.

      #!/usr/bin/perl -w use strict; use Data::Dumper; my @categories = qw/ 11100 11010 11001 10110 10101 10011 01110 01101 01011 00111 /; @categories = sort @categories; my @cat2 = sort { $a <=> $b } map { ord pack 'b*', $_ } @categories; print Dumper \@categories, \@cat2;

      Which prints the following:

      $VAR1 = [ '00111', '01011', '01101', '01110', '10011', '10101', '10110', '11001', '11010', '11100' ]; $VAR2 = [ 7, 11, 13, 14, 19, 21, 22, 25, 26, 28 ];

      Needless to say, the list seems arbitrary (even though we know it's not) and try as I might, I can't come up with a method of creating that, much less writing a generalized routine. I thought about trying to discover a pattern in the sequences, but no dice. Later, I tried creating a "picture" of the bits and swapping pairs, but I couldn't come up with a sequence for that, either. I'll start looking into permutators, but I feel like I'm missing something awfully basic here. There are only 10 possible combinations, so I didn't think generating them would be that hard :(

      Cheers,
      Ovid

      Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

          '00111', '01011', '01101', '01110', '10011', '10101', '10110', '11001', '11010', '11100'

        So it looks like you're shifting the highest bit up by one, then the next highest bit, then the next, until you've run out of empty bits. How about something like (untested):

        use Bit::Vector; my $joins = 2; my $splits = 3; my $length = $joins+$splits; my $start = '0'x$joins . '1'x$splits; my $vector = Bit::Vector->new_Bin($length, $start); my @combinations = (); for my $pos ($joins-1..$length-1) { # 0-based, right? for my $bit ($splits-1..0) { $vector->bit_flip($pos+$bit); $vector->bit_flip($pos+$bit-1); push @combinations, $vector->to_Bin(); } }

        --
        The hell with paco, vote for Erudil!
        :wq

        As we discussed in that meeting, here's the code snippet I was thinking about to generate the binary strings:
        print map "$_\n", strings_for(6, 4); sub strings_for { my ($cats, $cols) = @_; $cats--; $cols--; my @ret; for (0..(1 << $cats) - 1) { my $bitstring = substr(unpack("B*", pack "N", $_), -$cats); next unless $bitstring =~ tr/1// == $cols; push @ret, $bitstring; } @ret; }

        -- Randal L. Schwartz, Perl hacker

      Counter example:
      1 2 3 4 1 2 3 4 1 2 3 4 5 6 5 6 5 6 1 3 4 5 1 3 4 5 1 3 4 5 2 6 2 6 2 6 1 2 3 6 1 2 3 6 1 2 3 6 4 5 4 5 4 5 1 2 3 6 1 2 3 6 1 2 3 6 4 5 4 5 4 5 ...

      That's twelve tries and there are more still.

        That's twelve tries and there are more still.

        No, there are only 10. Your attempt at a counter example violates the requirement that "The categories must remain in order".

Re: Puzzle: need a more general algorithm
by ferrency (Deacon) on Jul 08, 2002 at 20:03 UTC
    This is a very interesting problem. Broken down to its core, it seems like your problem can be restated like this:

    Given a set of N values, divide the set into M subsets, such that no subset is empty, and the maximum of the sums of the values in the subsets is minimized.

    I think a perfect solution is hard and would be slow. But you can probably get a solution that works relatively well much more quickly.

    One way to look at the problem: For each value (category) determine which subset (column) it belongs in.

    Another way is: For each subset (column), determine which values(categories) belong in it.

    Assuming your number of categories is at least double your number of columns, you might want to see what kind of results you get with this technique:

    • Sort the categories by height, decreasing.
    • Once for each column: put the highest unassigned category into this column.
    • For each column in reverse order, until there are no categories left: put the next highest unassigned category into this column.
    With your sample data:

    my @height = qw/ 10 15 25 30 10 13 /;
    we'd get the following solution:

    column 1: 30
    column 2: 25
    column 3: 15
    column 4: 13
    column 4: +10
    column 3: +10
    
    This is equivalent to what you have, but much more straightforward to calculate. And the solution is more general. However, at this point I have nothing other than an intuition (and a few test cases) that this solution will probably be good enough most of the time.

    If you have many more categories than columns, this will probably stop working well pretty quickly. If you have 3 very large categories and 3 very small categories, this method won't find the solution that places all three small categories in the same column.

    You might also try:

    For each category, in decreasing height order: put that category in the column with the lowest current total height.

    Thanks for the thought-provoking node. I'm looking forward to others' solutions to this problem.

    Update:Dragonchild is absolutely right: I did miss that constraint. Now for some more thought on the subject...

    Alan

      You missed a constraint - the ordering of the categories must be preserved.

      ------
      We are the carpenters and bricklayers of the Information Age.

      Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

Re: Puzzle: need a more general algorithm
by Zaxo (Archbishop) on Jul 08, 2002 at 20:12 UTC

    How about this? Sum the number of products over categories to get the total number of products. Divide that total by the number of columns to get the average height of a column. Shift categories off an array of unused ones and push them onto the current column if the new total would be closer to the average than the old. Keep a running average as a tiebreaker.

    That may have corner cases which it doesn't optimize, but it should be good enough and run a whole lot faster than exhaustive search.

    Update: Here is a corner case that doesn't optimize this way: @heights = qw/ 25 23 24 35 3 40/; $cols = 5; More attention to the running average could fix that, but I'm sure loopholes would remain.

Re: Puzzle: need a more general algorithm
by dragonchild (Archbishop) on Jul 08, 2002 at 19:49 UTC
    Three possibilities:
    1. More columns than categories
    2. Equal
    3. More categories than columns
    The first is insoluble. The second is trivial. The third is interesting.

    In the third, just combine neighboring categories into a "super-category" and re-evaluate.

    ------
    We are the carpenters and bricklayers of the Information Age.

    Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

Re: Puzzle: need a more general algorithm
by fglock (Vicar) on Jul 08, 2002 at 20:42 UTC

    If you want to generate all possible combinations, you could use this algorithm:

    $categories = 5; $columns = 3; $a = join('#', 0..$categories); @columns = (); &check_columns ($a, @columns); sub check_columns { my $cat = shift; my @columns = @_; my $size = split('#', $cat)-1; foreach (1.. $size ) { my ($first, $second) = $cat =~ /^((?:.*?#){$_})(.*?)$/; $first =~ s/#$//; my @result = ($first, $second, @columns); if ($#result == $columns) { print join(" ", @result), "\n", } else { check_columns(@result); } } }

    It generates this list of indexes, from which you can choose the best line:

    0 1 2 3#4#5 0 1 2#3 4#5 0 1#2 3 4#5 0#1 2 3 4#5 0 1 2#3#4 5 0 1#2 3#4 5 0#1 2 3#4 5 0 1#2#3 4 5 0#1 2#3 4 5 0#1#2 3 4 5

    I used an array of strings instead of an array of arrays because it is easier to debug.

Re: Puzzle: need a more general algorithm
by dws (Chancellor) on Jul 08, 2002 at 20:57 UTC
    O.K., I may have misread this problem at first because "height" is somewhat ambigous. If "height" somehow refers to the aggregation of the data, here's another approach.

    Given the constraints, I count 10 possible groupings:

    [1][2][3][4,5,6] [1][2][3,4][5,6] [1][2][3,4,5][6] [1][2,3][4][5,6] [1][2,3][4,5][6] [1][2,3,4][5][6] [1,2][3][4][5,6] -- the one you show [1,2][3][4,5][6] [1,2][3,4][5][6] [1,2,3][4][5][6]
    So my approach two is to brute force sum the values for each possible combination, and select the one that best meets the criteria about "height".

    Then, assuming that works, I'd generalize an algorithm for generating groupings given number of categories and number of columns. Like the one fglock provides above.

    Is this what you're looking for, or does "height" refer to the number of rows in the table?

      Here's an observation that might lead to a quick way to determine unique bucket combinations.

      The set of bucket sizes above is the union of two sets: the set of all unique arrangements of (1 1 1 3), and the set of all unique arrangements of (1 1 2 2). The sum of the numbers is equal to the number of catagories. This can be generalized to cover different numbers of buckets and different numbers of categories.

Re: Puzzle: need a more general algorithm
by Abigail-II (Bishop) on Jul 09, 2002 at 16:25 UTC
    The problem smells badly to being an NP-complete problem. And those are believed that they cannot be done efficiently. Hence, we might as well backtrack. And which mechanism in Perl is good at backtracking? Right, regular expressions.

    Therefore, I present a solution that will use a regular expression to do the hard work. It will report the minimum height that is needed. Calculating the actual partition is left as an exercise for the reader.

    Abigail

    #!/usr/bin/perl use strict; use warnings 'all'; sub partition ($$); my $columns = 4; my @sizes = qw /10 15 25 30 10 13/; sub partition ($$) { my ($b, $h) = @_; return [(0) x $h] unless $b; return [$b] if $h < 2; map {my $__ = $_; map {[$__ => @$_]} partition $b - $__, $h - 1} 0 + .. $b; } my @r = partition @sizes, $columns; my @regex; foreach my $r (@r) { my $c = 0; push @regex => join ":" => map { $c += my $__ = $_; join ("" => map {"-" x $_} @sizes [$c - $__ .. $c - 1]) . "-*" +} @$r; } my $regex = join "|" => map {"(?:$_)"} @regex; my $try = 1; { exit !print "Minimum required height: $try\n" if join (":" => map {"-" x $try} 1 .. $columns) =~ /$regex/; $try ++; redo; }

      Impressive.

      Once you have the minimum height, calculating the actual partition is equivalent to the text justification problem, with each category being a word, its length being the word's length, and the minimum height being the line length. This has been done before.

      --
      The hell with paco, vote for Erudil!
      :wq

        Actually, once you know the height that will work, a simple greedy algorithm will do (stuff as much as you can in the first column, then the next, and the next, etc). After all, it was given that any solution would do, not the prettiest or something like that.

        Abigail

Re: Puzzle: need a more general algorithm
by dws (Chancellor) on Jul 08, 2002 at 20:25 UTC
    To distribute N categories across M columns as evenly as possible, assuming N >= M, observe that

    1. (N mod M) columns will contain ((N div M) + 1) categories
    2. the remaining (M - (N mod M)) columns, if any, will contain (N div M) categories.

    Given this, categories can be distributed in one pass.

    Update: Oh blast. I may have misread the problem, and confused the "height of the table in rows" with "the aggregate height summed from the @height data".

      It appears that I have confused a few people here. What I need "minimized" is the aggregate height per column as summed from @height data.

      Given the heights of qw/ 10 10 15 20 10 10 /, I could conceivably construct a table with no column having greater than 20 items in it:

      10 15 20 10 10 10

      The following solution, would fail because one of the columns has 25 items:

      10 10 20 10 15 10

      Part of the problem, I suspect, is that I had a bug in my original code because I accidentally posted the wrong version. The @distribution array needs to be reset every iteration. The following is correct:

      #!/usr/bin/perl -w use strict; use Data::Dumper; my @height = qw/ 10 15 25 30 10 13 /; my @columns = ($height[0],0,0,$height[-1]); my $curr_height = 0; # set this unreasonably high to ensure that subsequent # heights will be lower $curr_height += $_ foreach @height; my @distribution; for my $a (0..1) { $columns[$a] += $height[1]; for my $b (0..2) { $columns[$b] += $height[2]; for my $c (1..3) { $columns[$c] += $height[3]; for my $d (2..3) { $columns[$d] += $height[4]; my $this_height = ( sort @columns )[-1]; my $valid_dist; foreach ( @columns ) { $valid_dist = $_; last if ! $valid_dist; } if ( $valid_dist and $this_height < $curr_height ) { $curr_height = $this_height; @distribution = ([1],[],[],[6]); push @{$distribution[$a]}, 2; push @{$distribution[$b]}, 3; push @{$distribution[$c]}, 4; push @{$distribution[$d]}, 5; } $columns[$d] -= $height[4]; } $columns[$c] -= $height[3]; } $columns[$b] -= $height[2]; } $columns[$a] -= $height[1]; } print Dumper \@distribution;

      I hope that's clear, now :)

      I also have to add that there's a heck of a lot more discussion on this than I thought there would be.

      Cheers,
      Ovid

      Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

      Keep in mind that this is a weighted distribution: three categories with weight 10 are worth one category with weight 30 (so if N=2, you'd want 3/1 rather than 2/2).

      This looks to me like a constrained (order must not change) variant on the bin packing problem. Bin packing is NP-complete in the general case (IIRC), but the order constraint makes this quite tractable (see merlyn's typesetting comment). This problem has a rather good (n lg n or n^2) solution via dynamic programming: the typesetting problem was on one of my assignments in an advanced algorithms class. (Hey, did Ovid just post a homework problem? ;-b) I'll dig through my old notes when I get home from work and see if I can find it. In the mean time, you (Ovid) might look at Text::Format for ideas.

      Update: D'oh! Another constraint on the text formatting problem that isn't present here is a maximum on line length (bin size), which makes it hard to reject a bogus solution (too much in one bin) quickly. On the other hand, this solution is constrained by number of bins, which the text formatting solution isn't. Hmm.... (Great problem, Ovid!)

      --
      The hell with paco, vote for Erudil!
      :wq

      This seems to blur the lines of the original problem. There might be 103 categories to distribute, but they have to appear in chunks of 10, 15, 25, 30, 10, and 13.

      _____________________________________________________
      Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
      s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

      This is true if your categories all have equal heights. But an optimal solution for a wide height distribution might place, for example, three categories in one column, and one category in each of the other three columns. A sample dataset for which that would be the optimal solution:

      my @heights = (4, 4, 4, 1, 1, 1); # in 4 columns
      Alan
Puzzle for the puzzle: Re: Puzzle...
by Ovid (Cardinal) on Jul 09, 2002 at 16:49 UTC

    I was sent the following answer to the puzzle with the following challenge:

    • Why does this work
    • How efficient is is. i.e., how fast it calculates group_cats(50, 1..150) -- with the caveat that it might not be that efficient in memory.

    If you think you know who sent this, please do not speculate! I will neither confirm nor deny ...

    use strict; use Carp; use Data::Dumper; $Data::Dumper::Indent = 1; print Dumper [group_cats(3, 1..7)]; { my @sizes; my %ans; sub group_cats { (my $num, @sizes) = @_; %ans = (); my ($size, @partition) = _group_cats($num, 0, $#sizes); return @partition; } sub _group_cats { my $key = join ":", @_; my ($num, $start, $end) = @_; if (not exists $ans{$key}) { if ($num < 1) { $ans{$key} = [0]; } elsif (1 == $num) { my @part = map $sizes[$_], $start..$end; $ans{$key} = [sum(@part), \@part]; } else { my $num_a = int($num/2); my $num_b = $num - $num_a; my $min_mid = $start + $num_a - 1; my $max_mid = $end - $num_b; my $mid = int(($min_mid + $max_mid)/2); my ($last_a, @part_a) = _group_cats($num_a, $start, $mid); my ($last_b, @part_b) = _group_cats($num_b, $mid + 1, $end); my $best = max($last_a, $last_b); my @best_part = (@part_a, @part_b); while ($min_mid < $max_mid) { if ($last_a <= $last_b) { if ($min_mid < $mid) { $min_mid = $mid; } else { $min_mid = $mid + 1; } } else { $max_mid = $mid; } $mid = int(($min_mid + $max_mid)/2); ($last_a, @part_a) = _group_cats($num_a, $start, $mid); ($last_b, @part_b) = _group_cats($num_b, $mid + 1, $end); if (max($last_a, $last_b) < $best) { $best = max($last_a, $last_b); @best_part = (@part_a, @part_b); } } $ans{$key} = [$best, @best_part]; } } return @{$ans{$key}}; } } sub max { my $max = shift; for (@_) { $max = $_ if $max < $_; } $max; } sub sum { my $sum = 0; $sum += $_ for @_; return $sum; }

    Cheers,
    Ovid

    Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

      And here is an optimized version. Add in the question of what optimizations are present, and why they work. Some are more obvious than others. The thoughtful hacker might also wonder at why the $max_mid, $min_mid logic switched.

      Oh right. And this adds the restriction that the heights should be integers. I think that is a reasonable rule for the speedup...

      use strict; { use integer; my @sizes; my %ans; my @align; sub group_cats { (my $num, @sizes) = @_; %ans = (); my $a = 1; for my $i (1..$#sizes) { if ($a + $a < $i) { $a += $a; } $align[$i] = $a; } $align[0] = 0; my ($size, @partition) = _group_cats($num, 0, $#sizes); return @partition; } sub _group_cats { my $key = join ":", @_; my ($num, $start, $end) = @_; if (not exists $ans{$key}) { if ($num < 1) { $ans{$key} = [0]; } elsif (1 == $num) { my @part = @sizes[$start..$end]; my $sum = 0; $sum += $_ for @part; $ans{$key} = [$sum, \@part]; } else { my $num_a = $align[$num]; my $num_b = $num - $num_a; my $min_mid = $start + $num_a - 1; my $max_mid = $end - $num_b; my $mid = $min_mid + $align[$max_mid - $min_mid]; my ($last_a, @part_a) = _group_cats($num_a, $start, $mid); my ($last_b, @part_b) = _group_cats($num_b, $mid + 1, $end); my $best = $last_a < $last_b ? $last_b : $last_a; my @best_part = (@part_a, @part_b); while ($min_mid < $max_mid) { if ($last_b <= $last_a) { if ($max_mid > $mid) { $max_mid = $mid; } else { $max_mid--; } } else { $min_mid = $mid; } $mid = $min_mid + $align[$max_mid - $min_mid]; ($last_a, @part_a) = _group_cats($num_a, $start, $mid); ($last_b, @part_b) = _group_cats($num_b, $mid + 1, $end); my $max = $last_a < $last_b ? $last_b : $last_a; if ($max < $best) { $best = $max; @best_part = (@part_a, @part_b); } } $ans{$key} = [$best, @best_part]; } } return @{$ans{$key}}; } }
      BTW the original question had an straightforward recursive solution that scaled exponentially. I found that too easy, hence the faster solution.
Re: Puzzle: need a more general algorithm
by runrig (Abbot) on Jul 09, 2002 at 01:42 UTC
    I figure you'll always combine the smallest category with the one before or after it (Update: realized that in general, this is not a valid assumption; finding example left as excercise, though maybe finding smallest consecutive two column sum, then generating possibilities summing of columns before and after that might work...), so here's my fairly (in)efficient (O(N**2)) answer:
    #!/usr/bin/perl use strict; use warnings; my $num_columns = 4; my @cat = (15, 15, 10, 10, 15, 15); my @ans = squish(@cat); print "@ans\n"; sub squish { my @arr = @_; my @aoa = (\@arr); for ($num_columns..(@arr-1)) { my @tmp_aoa; push @tmp_aoa, squisher(@$_) for @aoa; @aoa = @tmp_aoa; } my $best_ans; my $best_value; for my $aref (@aoa) { my $max_value; for my $value (@$aref) { $max_value = $value if !defined $max_value or $value > $max_valu +e; } $best_ans = $aref, $best_value = $max_value if !defined $best_value or $max_value < $best_value; } @$best_ans; } sub squisher { my @arr = @_; my $min_col; my $min_value; for (0..$#arr) { $min_value = $arr[$_], $min_col = $_ if !defined $min_value or $arr[$_] < $min_value; } my @arr1 = ($min_col > 0) ? @arr : (); my @arr2 = ($min_col < $#arr) ? @arr : (); splice(@arr1, $min_col-1, 2, $arr1[$min_col-1] + $arr1[$min_col]) if @arr1; splice(@arr2, $min_col, 2, $arr2[$min_col] + $arr2[$min_col+1]) if @arr2; return ((@arr1 ? \@arr1 : ()), (@arr2 ? \@arr2 : ())); }
    Update: mildly tested...
Re: Puzzle: need a more general algorithm
by Aristotle (Chancellor) on Jul 09, 2002 at 13:18 UTC

    Sounds like a case for Branch and Bound.

    I would partition the problem space by adding the next free category to either last used column or the next empty column. The root problem would be "category 1 in column 1", so the children would "category 2 in column 1" and "category 2 in column 2", etc. The low boundary for each solution is the height of the highest column, and the solution's high boundary is maximum(sum of heights of free categories, lower boundary). There is a constraint (free categories > empty columns) that has to be fulfilled to assure there will not be empty columns.

    It's probably best to generate the solution tree breadth-first as you will descend far down the left side very quickly if the global lower and upper boundaries have not been narrowed down. Breadth-first would garantuee that you limit them soon. On the other hand, for larger problems the breadth-first approach will also require more memory to hold the solution tree.

    I tried to write code yesterday but it was 3am and my mind wouldn't cooperate. Now I've just gotten up way waay too late into the day and it still doesn't. When I'm feeling a bit fresher I will update here with some code (or capitulation *g*). The tricky problem is coming up with a convenient data structure - I tried anonymous arrays for the columns, but it's inconvenient to have to deep copy to generate the child problems.

    Update: sorry, would take too much time right now. :-( I will have a spare evening in a few days and may I'll get back to it then. Sigh..

    Makeshifts last the longest.

Re: Puzzle: need a more general algorithm
by FoxtrotUniform (Prior) on Jul 09, 2002 at 02:56 UTC

    Note: I'm thinking this up as I go along, so take it with an appropriately sized grain of salt.

    Once you've got your data in an appropriate arrangement (for example, the position Ovid gives us:

    1 3 4 5 2 6

    You can perform at most two operations on each bucket:

    1. Shift the "bottom" element one stack to the left
    2. Shift the "top" element one stack to the right

    In this case, there are only two legal operations:

    1. Shift element 2 to the second column
    2. Shift element 5 to the third column

    My first idea was another greedy approach: start with the data in a valid (though almost certainly not optimal) state, and perform the operation that gives the most benefit. (In other words, hill climbing.) Since our score is the longest column we have so far, we're only going to climb hills (lower our score, which in this case is good) by operating on the longest column. Unfortunately, this leads us to local maxima, not global ones. Note: I haven't proven this, it's just something that tends to be true of hill climbing algorithms. I'm a bit hopeful about this because the domain is finite, and you might get decent results by picking a few random starting points, hill climbing on each of them for a small number of iterations, and picking the best one, but on the other hand....

    --
    The hell with paco, vote for Erudil!
    :wq

Re: Puzzle: need a more general algorithm
by dws (Chancellor) on Jul 10, 2002 at 06:48 UTC
    I've been chewing on this problem for a day now. This is clearly problem with two logical parts. Part 1 is to generate all legal mappings of N columns of data into M buckets, given the constraints that no bucket can be empty, and that the columns need to stay ordered. The second part is apply these mappings to the input data, and select a mapping that yields a "best fit."

    I focused on the first part, looking for a quicker, simpler solution. I think I have one. Here it is. Given a number of columns and a number of buckets, the code below calculates all legal mappings of columns to buckets, and returns these in a hash, where the key is a printable string, and the value is an anonymous array.

    { # map columns to buckets. key is string, value is anonymous array. my %c2bMap; sub c2bMappings { my($buckets, $columns) = @_; die "bogus args" unless $buckets > 1 && $columns > $buckets; %c2bMap = (); _genFrom(0, (0) x ($columns - $buckets), 1 .. ($buckets - 1)); return \%c2bMap; } sub _genFrom { my @c2bMap = @_; return if exists $c2bMap{"@c2bMap"}; print "@c2bMap\n"; #DEBUG $c2bMap{"@c2bMap"} = \@c2bMap; foreach my $i ( 2 .. $#c2bMap ) { my $n = $c2bMap[$i] - 1; if ( $c2bMap[$i - 2] == $n && $c2bMap[$i - 1] == $n ) { local $c2bMap[$i-1] = $c2bMap[$i]; _genFrom(@c2bMap); } } } } c2bMappings(4,6); __END__ 0 0 0 1 2 3 0 0 1 1 2 3 0 1 1 1 2 3 0 1 1 2 2 3 0 1 2 2 2 3 0 1 2 2 3 3 0 1 2 3 3 3 0 1 1 2 3 3 0 0 1 2 2 3 0 0 1 2 3 3
    Recursion only happens with valid mappings. Note the selective localization of an element of the array that the code is about to recurse on.

Re: Puzzle: need a more general algorithm
by fglock (Vicar) on Jul 09, 2002 at 12:10 UTC

    I think this code is usable:

    use strict; my @height = qw/ 10 15 25 30 10 13 /; my $columns = 3; my @Best = (); my $Min_height = 1E9; &check_columns ( [ 0 .. $#height ] ); print "best result: \n"; # print join(" ", map { join('#', @$_) } @Best), "\n"; foreach my $y (0..$Table_height) { foreach my $x (0..$columns) { my $data = ${$Best[$x]}[$y]; print defined $data ? $data : " "; print " "; } print "\n"; } # check column height combinations sub check_columns { my $pcat = shift; foreach my $p (0 .. $#{@$pcat} - 1) { my @result = ( [ @$pcat[0 .. $p] ], [ @$pcat[$p + 1 .. $#{@$pc +at}] ], @_); if ($#result == $columns) { # @result is an array of arrays # print join(" ", map { join('#', @$_) } @result), "\n"; my $max_height = 0; foreach my $j (@result) { my $height = 0; # foreach my $i (@$j) { print "$i:$height[$i] "; }; pr +int "\n"; foreach my $i (@$j) { $height += $height[$i] }; $max_height = $height if $height > $max_height; } # print "max_height: $max_height\n"; if ($max_height < $Min_height) { $Min_height = $max_height; @Best = @result; } } else { check_columns(@result); } } }

    It prints:

    best result: 0 2 3 4 1 5

    You can un-comment the print lines to check how it works.

    @Best is an array-of-arrays.

    Update: show table.

      By the way, $columns = 2 gives:

      best result: 0 3 4 1 5 2

      $columns = 4 gives:

      best result: 0 1 2 3 4 5
Re: Puzzle: need a more general algorithm
by lemming (Priest) on Jul 10, 2002 at 04:47 UTC

    Ok, this is that straight forward recursive method. By keeping track of the subsets we've already looked at, it's not too bad with the 50, 1-150 set.

    Update: Pass data by reference as suggested. A couple other small changes. Added another indice for array totals.

    #!/usr/bin/perl use strict; use warnings; use constant DEBUG => 0; # my $columns = 6; # my @data = qw( 10 13 25 30 10 15 1 4 25); my $columns = 50; my @data = (1..150); die "Not enough catagories for columns\n" if $columns > @data; our $mega_height = sum(\@data); my %key_h; my %add_h; my $best_r = get_best($columns, \%add_h, \%key_h, \@data); printit($best_r); exit; sub get_best { my ($columns, $add_r, $key_r, $data_r) = @_; my $max_stack = @$data_r - $columns + 1; my $max_height = $mega_height; my $fed_key = join("-", @$data_r, $columns); print "[", join(",", @$data_r),"]-",$columns,"\n" if DEBUG; my $best_r; foreach my $stack ( 1 .. $max_stack ) { my @arr; $arr[0] = [ @$data_r[0..$stack-1] ]; my $tmp_r; if ($columns == 2) { # We only have one more column to fill push(@arr, [ @$data_r[$stack..@$data_r-1] ]); } elsif (@$data_r - $stack == $columns - 1 ) { # One cat per column left map push(@arr, [ $_ ]), @$data_r[$stack..@$data_r-1]; } else { my $key = join("-", @$data_r[$stack..@$data_r-1], $columns - 1); # See if we've done this before if ( defined( $key_r->{$key} )) { $tmp_r = $key_r->{$key}; } else { $tmp_r = get_best( $columns - 1, $add_r, $key_r, [@$data_r[$stack..@$data_r-1]] ); } push ( @arr, @$tmp_r ); } my $cur_height = 0; foreach my $col_r (@arr ) { my $height; my $ckey = join(",",@$col_r); if ( defined( $add_r->{$ckey} )) { $height = $add_r->{$ckey}; } else { $height = sum($col_r); $add_r->{$ckey} = $height; } $cur_height = $height if $cur_height < $height; } printit(\@arr) if DEBUG; if ( $cur_height < $max_height or !defined($best_r)) { $best_r = \@arr; $max_height = $cur_height; } } $key_r->{$fed_key} = $best_r; return $best_r; } sub sum { my ($col_r) = @_; my $height = 0; foreach my $bit ( @$col_r ) { $height += $bit; } return $height; } sub printit { my ($arr_r) = @_; my $start = 1; my $max = 0; foreach my $col_r (@$arr_r) { print ", " unless $start; $start = 0 if ( $start ); my $height = sum($col_r); print "[ ", join(", ", @$col_r), " ]"; $max = $height if $height > $max; } print " => $max\n"; }
      Very good. Pre-memoization you are exponential. With memoization your memory usage scales like O(n**3), and your performance is O(n**4). If you passed the array by reference, and added 2 indices, you would drop a factor of n off of both.

      By contrast the efficient algorithm is sub-exponential pre-memoization, and is sub-quadratic after.

      Which shows that good algorithms are a big win, but being straightforward, and then applying well-known speedups to that, can still result in a usable algorithm.

Re: Puzzle: need a more general algorithm
by zaimoni (Beadle) on Jul 09, 2002 at 09:55 UTC
    Thinking vaguely...

    Notation:

    • Columns: N
    • Category count: Q

    Assuming the problem makes sense (Q>N), yet another way to think about the problem space is to enumerate the cells like this:

    1 3 5 7
    2 4 6 8

    Then, if we index the categories by 0...Q-1 (fine because the category order must be preserved), we can consider the location of a category to be L(category index). The nature of this location is yet to be determined. I see that the categories always fill up from the top down.

    Now, implementation details are critical...so I may be off on a wild goose chase. I'm envisioning this target (X)HTML table as a single row with a single cell for each column. Then the top-down ordering is automatic, and we're just calculating where to put the table cell delimiters. In this case, the location L corresponds simply to the column, and I can forget about rows entirely for the analysis. Then

    L(0)<=L(1)<=...<=L(Q-2)<=L(Q-1)

    In particular, L(0)=1 and L(Q-1)=N. In some way, we need to track the mininum category index for each column 2...N.

    Merlyn pointed out the above, in his comment about binary representation.

    Untested pseudo-code follows:

    my $ColumnCount = 4; # or whatever we need it to be my $CategoryCount = 6; # or whatever we need it to be my %Weight = (); # TODO: initialize %Weight hash with values from 0 to Q-1 my %WeightCache = (); # to be filled with weights # TODO: initialize weight cache to handle references to # hashes with indexes 1..Q-1 # brute-force would iterate across combinations of 1...Q-1; # for 6 categories and 4 columns, this is 5 choose 3 i.e. 10 sub WeightSpan { my ($LowBound,$HighBound) = @_; my $WeightSum = 0; $WeightSum += $Weight{$LowBound++} while $LowBound<$HighBound; return $WeightSum; } sub IncrementCombination { my ($LowBound,$HighBound,@Combination) = @_; my $Idx = 0; while($Combination[$Idx-1]=$HighBound-$Idx) { return () if 1==$Idx+scalar @Combination; $Idx--; }; $Combination[$Idx-1]++; $Combination[++$Idx -1]++ while $Idx; return @Combination; } sub MaximumHeight { my ($LowBound,$HighBound,@Combination) = @_; my @IndexSpan = (); # TODO: auto-init @IndexSpan: 0...scalar @Combination-1 return max(map {$WeightCache{$Combination[$_]}{$Combination[$_+1]} ||= &WeightSpan($Combination[$_],$Combination[$_+1])} @IndexSpan); } sub CrunchBestCombination { my ($LowBound,$HighBound,$N) = @_; my @Combination = (); # TODO: auto-init: fill @Combination with 1...$N-1 my @BestCombination = @LowestBounds; my $BestHeight = &MaximumHeight($LowBound, $HighBound, @Combination); &IncrementCombination($LowBound, $HighBound, @Combination); while(@Combination) { my $CurrentMaxHeight = &MaximumHeight($LowBound,$HighBound,@Combination); if ($CurrentMaxHeight<$BestHeight) { $BestHeight = $CurrentMaxHeight; @BestCombination = @Combination; } @Combination = &IncrementCombination($LowBound, $HighBound, @Combination); } return ($BestHeight,@BestCombination); } my @TargetCombination = &CrunchBestCombination(1, $ColumnCount-1, $CategoryCount); my $BestHeight = shift(@TargetCombination); # TODO: translate @TargetCombination into </td><td> # cell breaks

    Obviously, all TODO comments should be implemented before the code has any chance of working. Also, a number of the example functions can be made more succinct.

Re: Puzzle: need a more general algorithm
by fglock (Vicar) on Jul 09, 2002 at 17:52 UTC

    Another way to do it (works!)

    use strict; my @height = qw/ 10 15 25 30 10 13 /; my $columns = 3; my @Best = (); my $Min_height = 1E9; my $Table_height = 0; &check_columns; foreach my $y (0..$Table_height) { foreach my $x (0..$columns) { my $data = ${$Best[$x]}[$y]; print defined $data ? $data : " "; print " "; } print "\n"; } sub check_columns { my $pcat = ( shift or [ 0 .. $#height ] ); my @a = @$pcat; my @b = (); map { push @b, shift @a; my @result = ( [@a], [@b], @_); if ($#result == $columns) { my $max_height = 0; my $table_height = 0; foreach my $j (@result) { my $height = eval join '+' => @height[@$j]; $max_height = $height if $height > $max_height; $table_height = $#{@$j} if $table_height < $#{@$j}; } if ($max_height < $Min_height) { $Min_height = $max_height; @Best = @result; $Table_height = $table_height; } } else { check_columns(@result); } } @a; }

    Update: I should have used for (@a) instead of map.

      This is definitely not optimized.

      More than 20 minutes have passed and (50, 1..150) didn't finish yet...

      Update: got Out of memory

•Re: Puzzle: need a more general algorithm
by merlyn (Sage) on Jul 12, 2002 at 00:07 UTC
Re: Puzzle: need a more general algorithm
by fglock (Vicar) on Jul 16, 2002 at 13:41 UTC

    This is the implementation I liked best (so far):

    # find subgroups with *almost* optimal sum distribution. # performance is roughly linear on N and Columns. use strict; my $benchmark; sub splitter { my ($columns, $plist) = @_; return $plist unless --$columns; my $sum1 = 0; my $sum2 = eval join "+" => @$plist; my @tmp1 = (); my @tmp2 = @$plist; my @best = (1e9, undef, undef); while ($#tmp2 > 0) { $benchmark++; push @tmp1, shift @tmp2; $sum1 += $tmp1[-1]; my $mean_ratio = $sum1 * $columns / ($sum2 - $sum1); $mean_ratio = 1 / $mean_ratio if $mean_ratio < 1; last unless $best[0] > $mean_ratio; @best = ($mean_ratio, [@tmp1], [@tmp2]); } return $plist if $best[0] == 1e9; # underflow return $best[1], $columns > 0 ? splitter($columns, $best[2]) : $best[2]; } my @list = (10, 15, 25, 30, 10, 13); for my $columns (3, 4, 5) { $benchmark = 0; print "[", map(" [@$_]", splitter($columns, \@list ) ), " ]\n\n"; print "Inner loop = $benchmark\n\n"; } my @list = (1..150); for my $columns (10, 50, 80, 100) { $benchmark = 0; print "[", map(" [@$_]", splitter($columns, \@list ) ), " ]\n\n"; print "Inner loop = $benchmark\n\n"; }

    Although this does not always give an exact solution, it has some advantages:

  • it is pretty fast
  • it is not memory-hungry
  • it works fine with random numbers
  • time is linearly proportional to number of columns and to list size
  • I think it might be possible to find a better solution in time O(N * columns * log(N * columns)), by making the inner loop recursive.

    update: it might be easy to rewrite "splitter" to don't use recursion, since it calls itself from outside the while loop

      Did you miss Re: Balance columns?

      It gets correct answers, works fine with random numbers (as long as they are all non-negative), is linear in memory consumption and the running time is O(N*log(N)).