Ovid has asked for the wisdom of the Perl Monks concerning the following question:
I've wound up with a bit of a tricky and ugly code that I need to
make more flexible. The problem: We have a client who displays their products
on our Web site. These products are divided into several categories that are
displayed in multiple columns. Here are the rules:
 Six categories distributed across four columns
 The categories must remain in order
 No column may be empty
 A category and its list of products must not span columns
 The total height of the resulting table must be the lowest possible amount
(multiple solutions may satisfy this, but it doesn't matter which is chosen)
For example, let's say I have categories one through six with these
respective heights (number of products):
my @height = qw/ 10 15 25 30 10 13 /;
With that, I might have those categories distributed as follows (pretending
that the index starts with 1 instead of zero):

Column 1 
Column 2 
Column 3 
Column 4 
Categories: 
1 
3 
4 
5 
Categories: 
2 


6 
Height: 
25 
25 
30 
23 
The following code will accurately determine that, noting that I need to
sort the actual values in each column. Further, it takes into account that the
1st and 6th category will always be in columns 1 and 4, respectively. Each
for loop is iterating over a subsequent category with the indices being the
only allowable values for a category (the second category is $a and
cannot possibly be in column 3 because that would leave column 2 empty).
#!/usr/bin/perl w
use strict;
use Data::Dumper;
my @height = qw/ 10 15 25 30 10 13 /;
my @columns = ($height[0],0,0,$height[1]);
my $curr_height = 0;
# set this unreasonably high to ensure that subsequent
# heights will be lower
$curr_height += $_ foreach @height;
my @distribution = ([1],[],[],[6]);
for my $a (0..1) {
$columns[$a] += $height[1];
for my $b (0..2) {
$columns[$b] += $height[2];
for my $c (1..3) {
$columns[$c] += $height[3];
for my $d (2..3) {
$columns[$d] += $height[4];
my $this_height = ( sort @columns )[1];
my $valid_dist;
foreach ( @columns ) {
$valid_dist = $_;
last if ! $valid_dist;
}
if ( $valid_dist and $this_height < $curr_height ) {
$curr_height = $this_height;
push @{$distribution[$a]}, 2;
push @{$distribution[$b]}, 3;
push @{$distribution[$c]}, 4;
push @{$distribution[$d]}, 5;
}
$columns[$d] = $height[4];
}
$columns[$c] = $height[3];
}
$columns[$b] = $height[2];
}
$columns[$a] = $height[1];
}
print Dumper \@distribution;
That code results in 121 iterations. Some possibilities are skipped by
last if ! $valid_dist because the for loops that I have written
sometimes allow for invalid combinations (if the 5th category ($d) is
in the fourth column, the 4th category cannot be in the second
column because that would leave the third column empty).
My real problem: the client dictates changes to the Web site
faster than a jackrabbit ... uh, never mind ... you get the idea.
Essentially, this used to be two columns. It's now four. It may turn into
three columns. We may have another category added or removed. My
code snippet works, but if the client changes his mind again, it would
be nice to simply have the code work. Any suggestions?
Cheers,
Ovid
Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.
Re: Puzzle: need a more general algorithm
by japhy (Canon) on Jul 08, 2002 at 19:59 UTC

use constant COL => 4;
my @data = qw( 10 15 25 30 10 13 );
my @cols = map [$_], @data;
while (@data > COL) {
my $i = 0;
my $s = $data[$i] + $data[$i+1];
for my $j (1 .. @data2) {
($i, $s) = ($j, $data[$j] + $data[$j+1])
if $data[$j] + $data[$j+1] < $s;
}
splice @data, $i, 2, $s;
splice @cols, $i, 2, [@{ $cols[$i] }, @{ $cols[$i+1] }];
}
_____________________________________________________
Jeff[japhy]Pinyan:
Perl,
regex,
and perl
hacker, who'd like a job (NYCarea)
s++=END;++y(;P)}y js++=;shajsj<++y(pq)}?print:??;  [reply] [d/l] 

Greediness does not work in general. What if your categories have sizes 15, 15, 10, 10, 15, 15? Your solution comes out with a suboptimal answer.
 [reply] 

Greediness does not work in general. What if your
categories have sizes 15, 15, 10, 10, 15, 15? Your solution
comes out with a suboptimal answer.
That's what backtracking's for. :)
Actually, that brings up an interesting question:
Ovid, are there any bounds on how optimal the answer
has to be? Exactly optimal (sucky if this problem turns
out to be NPHard)? Within a constant factor of
optimal (like, max length no more than 1.5x larger than
optimal)?

The hell with paco, vote for Erudil!
:wq
 [reply] 
•Re: Puzzle: need a more general algorithm
by merlyn (Sage) on Jul 08, 2002 at 20:17 UTC

Take a look at Knuth's TeX language. This is pretty close to the "line length" issue that gets solved for every paragraph set by his typesetting system. And if anyone has a good general algorithm for doing this,
it's got to be Knuth!
 Randal L. Schwartz, Perl hacker  [reply] 
•Re: Puzzle: need a more general algorithm
by merlyn (Sage) on Jul 08, 2002 at 20:35 UTC

One other thing to notice is that a particular distribution can be identified by a binary string of length equal to one less than the number of categories, and a number of 1bits equal to one less than the number of columns. Each 0bit denotes that the next category is in the same column as the prior category, while a 1bit denotes that the next category begins the next column over. Since the first category is forced into the first column, and the last category is forced into the last column, we get two freebies there.
So the total number of distributions of N categories into M columns is equal to the number of combinations of N1 things taken M1 at a time.
Dunno if this helps, but it should keep you from brute forcing more than you need. {grin}
In fact, for your particular dataset (6 categories, 4 columns), you shouldn't need to brute force more than (5 items taken 3 at a time which is) 10 tries.
Wow, that's less than I thought! But it desk checks properly. All you need is a good generating algorithm,
and you can brute force this!
 Randal L. Schwartz, Perl hacker  [reply] 

merlyn wrote: All you need is a good generating algorithm and you can brute force this!
I thought this idea was so intensely cool that I just had to try it out. However, coming up with a "good generating alorithm" escapes me. First, I took the list of possible permutations that dws created and translated it:
[1][2][3][4,5,6]
1 1 1 0 0
[1][2][3,4][5,6]
1 1 0 1 0
[1][2][3,4,5][6]
1 1 0 0 1
[1][2,3][4][5,6]
1 0 1 1 0
[1][2,3][4,5][6]
1 0 1 0 1
[1][2,3,4][5][6]
1 0 0 1 1
[1,2][3][4][5,6]
0 1 1 1 0
[1,2][3][4,5][6]
0 1 1 0 1
[1,2][3,4][5][6]
0 1 0 1 1
[1,2,3][4][5][6]
0 0 1 1 1
Then, once I was sure I understood it, I went ahead and hardcoded that so I could manipulate it and look for patterns.
#!/usr/bin/perl w
use strict;
use Data::Dumper;
my @categories = qw/
11100
11010
11001
10110
10101
10011
01110
01101
01011
00111
/;
@categories = sort @categories;
my @cat2 =
sort { $a <=> $b }
map { ord pack 'b*', $_ }
@categories;
print Dumper \@categories, \@cat2;
Which prints the following:
$VAR1 = [
'00111',
'01011',
'01101',
'01110',
'10011',
'10101',
'10110',
'11001',
'11010',
'11100'
];
$VAR2 = [
7,
11,
13,
14,
19,
21,
22,
25,
26,
28
];
Needless to say, the list seems arbitrary (even though we know it's not) and try as I might, I can't come up with a method of creating that, much less writing a generalized routine. I thought about trying to discover a pattern in the sequences, but no dice. Later, I tried creating a "picture" of the bits and swapping pairs, but I couldn't come up with a sequence for that, either. I'll start looking into permutators, but I feel like I'm missing something awfully basic here. There are only 10 possible combinations, so I didn't think generating them would be that hard :(
Cheers,
Ovid
Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.  [reply] [d/l] [select] 

'00111',
'01011',
'01101',
'01110',
'10011',
'10101',
'10110',
'11001',
'11010',
'11100'
So it looks like you're shifting the highest bit up by
one, then the next highest bit, then the next, until you've
run out of empty bits. How about something like (untested):
use Bit::Vector;
my $joins = 2;
my $splits = 3;
my $length = $joins+$splits;
my $start = '0'x$joins . '1'x$splits;
my $vector = Bit::Vector>new_Bin($length, $start);
my @combinations = ();
for my $pos ($joins1..$length1) { # 0based, right?
for my $bit ($splits1..0) {
$vector>bit_flip($pos+$bit);
$vector>bit_flip($pos+$bit1);
push @combinations, $vector>to_Bin();
}
}

The hell with paco, vote for Erudil!
:wq  [reply] [d/l] [select] 




As we discussed in that meeting, here's the code snippet I was thinking about to generate the binary strings:
print map "$_\n", strings_for(6, 4);
sub strings_for {
my ($cats, $cols) = @_;
$cats; $cols;
my @ret;
for (0..(1 << $cats)  1) {
my $bitstring = substr(unpack("B*", pack "N", $_), $cats);
next unless $bitstring =~ tr/1// == $cols;
push @ret, $bitstring;
}
@ret;
}
 Randal L. Schwartz, Perl hacker  [reply] [d/l] 

1 2 3 4 1 2 3 4 1 2 3 4
5 6 5 6 5 6
1 3 4 5 1 3 4 5 1 3 4 5
2 6 2 6 2 6
1 2 3 6 1 2 3 6 1 2 3 6
4 5 4 5 4 5
1 2 3 6 1 2 3 6 1 2 3 6
4 5 4 5 4 5
...
That's twelve tries and there are more still.
 [reply] [d/l] 

 [reply] 

Re: Puzzle: need a more general algorithm
by ferrency (Deacon) on Jul 08, 2002 at 20:03 UTC

This is a very interesting problem. Broken down to its core, it seems like your problem can be restated like this:
Given a set of N values, divide the set into M subsets, such that no subset is empty, and the maximum of the sums of the values in the subsets is minimized.
I think a perfect solution is hard and would be slow. But you can probably get a solution that works relatively well much more quickly.
One way to look at the problem: For each value (category) determine which subset (column) it belongs in.
Another way is: For each subset (column), determine which values(categories) belong in it.
Assuming your number of categories is at least double your number of columns, you might want to see what kind of results you get with this technique:
 Sort the categories by height, decreasing.
 Once for each column: put the highest unassigned category into this column.
 For each column in reverse order, until there are no categories left: put the next highest unassigned category into this column.
With your sample data:
my @height = qw/ 10 15 25 30 10 13 /;
we'd get the following solution:
column 1: 30
column 2: 25
column 3: 15
column 4: 13
column 4: +10
column 3: +10
This is equivalent to what you have, but much more straightforward to calculate. And the solution is more general. However, at this point I have nothing other than an intuition (and a few test cases) that this solution will probably be good enough most of the time.
If you have many more categories than columns, this will probably stop working well pretty quickly. If you have 3 very large categories and 3 very small categories, this method won't find the solution that places all three small categories in the same column.
You might also try:
For each category, in decreasing height order: put that category in the column with the lowest current total height.
Thanks for the thoughtprovoking node. I'm looking forward to others' solutions to this problem.
Update:Dragonchild is absolutely right: I did miss that constraint. Now for some more thought on the subject...
Alan  [reply] [d/l] 

You missed a constraint  the ordering of the categories must be preserved.
 We are the carpenters and bricklayers of the Information Age. Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.
 [reply] 
Re: Puzzle: need a more general algorithm
by Zaxo (Archbishop) on Jul 08, 2002 at 20:12 UTC

How about this? Sum the number of products over categories to get the total number of products. Divide that total by the number of columns to get the average height of a column. Shift categories off an array of unused ones and push them onto the current column if the new total would be closer to the average than the old. Keep a running average as a tiebreaker.
That may have corner cases which it doesn't optimize, but it should be good enough and run a whole lot faster than exhaustive search.
Update: Here is a corner case that doesn't optimize this way: @heights = qw/ 25 23 24 35 3 40/; $cols = 5; More attention to the running average could fix that, but I'm sure loopholes would remain.
 [reply] [d/l] 
Re: Puzzle: need a more general algorithm
by dragonchild (Archbishop) on Jul 08, 2002 at 19:49 UTC

Three possibilities:
 More columns than categories
 Equal
 More categories than columns
The first is insoluble. The second is trivial. The third is interesting.
In the third, just combine neighboring categories into a "supercategory" and reevaluate.
 We are the carpenters and bricklayers of the Information Age. Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.  [reply] 
Re: Puzzle: need a more general algorithm
by fglock (Vicar) on Jul 08, 2002 at 20:42 UTC

If you want to generate all possible combinations, you
could use this algorithm:
$categories = 5;
$columns = 3;
$a = join('#', 0..$categories);
@columns = ();
&check_columns ($a, @columns);
sub check_columns {
my $cat = shift;
my @columns = @_;
my $size = split('#', $cat)1;
foreach (1.. $size ) {
my ($first, $second) = $cat =~ /^((?:.*?#){$_})(.*?)$/;
$first =~ s/#$//;
my @result = ($first, $second, @columns);
if ($#result == $columns) {
print join(" ", @result), "\n",
}
else {
check_columns(@result);
}
}
}
It generates this list of indexes, from
which you can choose the best
line:
0 1 2 3#4#5
0 1 2#3 4#5
0 1#2 3 4#5
0#1 2 3 4#5
0 1 2#3#4 5
0 1#2 3#4 5
0#1 2 3#4 5
0 1#2#3 4 5
0#1 2#3 4 5
0#1#2 3 4 5
I used an array of strings instead of an array of
arrays because it is easier to debug.
 [reply] [d/l] [select] 
Re: Puzzle: need a more general algorithm
by dws (Chancellor) on Jul 08, 2002 at 20:57 UTC

O.K., I may have misread this problem at first because "height" is somewhat ambigous. If "height" somehow refers to the aggregation of the data, here's another approach.
Given the constraints, I count 10 possible groupings:
[1][2][3][4,5,6]
[1][2][3,4][5,6]
[1][2][3,4,5][6]
[1][2,3][4][5,6]
[1][2,3][4,5][6]
[1][2,3,4][5][6]
[1,2][3][4][5,6]  the one you show
[1,2][3][4,5][6]
[1,2][3,4][5][6]
[1,2,3][4][5][6]
So my approach two is to brute force sum the values for each possible combination, and select the one that best meets the criteria about "height".
Then, assuming that works, I'd generalize an algorithm for generating groupings given number of categories and number of columns. Like the one fglock provides above.
Is this what you're looking for, or does "height" refer to the number of rows in the table?
 [reply] [d/l] 

Here's an observation that might lead to a quick way to determine unique bucket combinations.
The set of bucket sizes above is the union of two sets: the set of all unique arrangements of (1 1 1 3), and the set of all unique arrangements of (1 1 2 2). The sum of the numbers is equal to the number of catagories.
This can be generalized to cover different numbers of buckets and different numbers of categories.
 [reply] 
Re: Puzzle: need a more general algorithm
by AbigailII (Bishop) on Jul 09, 2002 at 16:25 UTC

The problem smells badly to being an NPcomplete problem.
And those are believed that they cannot be done efficiently.
Hence, we might as well backtrack. And which mechanism in
Perl is good at backtracking? Right, regular expressions.
Therefore, I present a solution that will use a regular
expression to do the hard work. It will report the minimum
height that is needed. Calculating the actual partition is
left as an exercise for the reader.
Abigail
#!/usr/bin/perl
use strict;
use warnings 'all';
sub partition ($$);
my $columns = 4;
my @sizes = qw /10 15 25 30 10 13/;
sub partition ($$) {
my ($b, $h) = @_;
return [(0) x $h] unless $b;
return [$b] if $h < 2;
map {my $__ = $_; map {[$__ => @$_]} partition $b  $__, $h  1} 0
+ .. $b;
}
my @r = partition @sizes, $columns;
my @regex;
foreach my $r (@r) {
my $c = 0;
push @regex => join ":" => map {
$c += my $__ = $_;
join ("" => map {"" x $_} @sizes [$c  $__ .. $c  1]) . "*"
+} @$r;
}
my $regex = join "" => map {"(?:$_)"} @regex;
my $try = 1;
{ exit !print "Minimum required height: $try\n" if
join (":" => map {"" x $try} 1 .. $columns) =~ /$regex/;
$try ++;
redo;
}
 [reply] [d/l] 

Impressive.
Once you have the minimum height, calculating the
actual partition is equivalent to the text justification
problem, with each category being a word, its length
being the word's length, and the minimum height being the
line length. This has been done
before.

The hell with paco, vote for Erudil!
:wq
 [reply] 

Actually, once you know the height that will work, a
simple greedy algorithm will do (stuff as much as you
can in the first column, then the next, and the next,
etc). After all, it was given that any solution would
do, not the prettiest or something like that.
Abigail
 [reply] 
Re: Puzzle: need a more general algorithm
by dws (Chancellor) on Jul 08, 2002 at 20:25 UTC

To distribute N categories across M columns as evenly as possible, assuming N >= M, observe that
 (N mod M) columns will contain ((N div M) + 1) categories
 the remaining (M  (N mod M)) columns, if any, will contain (N div M) categories.
Given this, categories can be distributed in one pass.
Update: Oh blast. I may have misread the problem, and confused the "height of the table in rows" with "the aggregate height summed from the @height data".
 [reply] 

It appears that I have confused a few people here. What I need "minimized" is the aggregate height per column as summed from @height data.
Given the heights of qw/ 10 10 15 20 10 10 /, I could conceivably construct a table with no column having greater than 20 items in it:
10 15 20 10
10 10
The following solution, would fail because one of the columns has 25 items:
10 10 20 10
15 10
Part of the problem, I suspect, is that I had a bug in my original code because I accidentally posted the wrong version. The @distribution array needs to be reset every iteration. The following is correct:
#!/usr/bin/perl w
use strict;
use Data::Dumper;
my @height = qw/ 10 15 25 30 10 13 /;
my @columns = ($height[0],0,0,$height[1]);
my $curr_height = 0;
# set this unreasonably high to ensure that subsequent
# heights will be lower
$curr_height += $_ foreach @height;
my @distribution;
for my $a (0..1) {
$columns[$a] += $height[1];
for my $b (0..2) {
$columns[$b] += $height[2];
for my $c (1..3) {
$columns[$c] += $height[3];
for my $d (2..3) {
$columns[$d] += $height[4];
my $this_height = ( sort @columns )[1];
my $valid_dist;
foreach ( @columns ) {
$valid_dist = $_;
last if ! $valid_dist;
}
if ( $valid_dist and $this_height < $curr_height ) {
$curr_height = $this_height;
@distribution = ([1],[],[],[6]);
push @{$distribution[$a]}, 2;
push @{$distribution[$b]}, 3;
push @{$distribution[$c]}, 4;
push @{$distribution[$d]}, 5;
}
$columns[$d] = $height[4];
}
$columns[$c] = $height[3];
}
$columns[$b] = $height[2];
}
$columns[$a] = $height[1];
}
print Dumper \@distribution;
I hope that's clear, now :)
I also have to add that there's a heck of a lot more discussion on this than I thought there would be.
Cheers,
Ovid
Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.  [reply] [d/l] [select] 

Keep in mind that this is a weighted distribution: three
categories with weight 10 are worth one category with
weight 30 (so if N=2, you'd want 3/1 rather than 2/2).
This looks to me like a constrained (order must not
change) variant on the bin packing problem.
Bin packing is NPcomplete in the general case (IIRC), but
the order constraint makes this quite tractable (see
merlyn's typesetting comment). This
problem has a rather good (n lg n or n^2) solution via
dynamic programming: the typesetting problem was on one of
my assignments in an advanced algorithms class. (Hey, did
Ovid just post a homework problem? ;b) I'll dig through
my old notes when I get home from work and see if I can
find it. In the mean time, you (Ovid) might look at
Text::Format for ideas.
Update: D'oh! Another constraint on the text
formatting problem that isn't present here is a
maximum on line length (bin size), which makes it hard to
reject a bogus solution (too much in one bin) quickly. On
the other hand, this solution is constrained by number of
bins, which the text formatting solution isn't. Hmm....
(Great problem, Ovid!)

The hell with paco, vote for Erudil!
:wq
 [reply] 

This seems to blur the lines of the original problem. There might be 103 categories to distribute, but they have to appear in chunks of 10, 15, 25, 30, 10, and 13.
_____________________________________________________
Jeff[japhy]Pinyan:
Perl,
regex,
and perl
hacker, who'd like a job (NYCarea)
s++=END;++y(;P)}y js++=;shajsj<++y(pq)}?print:??;
 [reply] 

This is true if your categories all have equal heights. But an optimal solution for a wide height distribution might place, for example, three categories in one column, and one category in each of the other three columns. A sample dataset for which that would be the optimal solution:
my @heights = (4, 4, 4, 1, 1, 1); # in 4 columns
Alan
 [reply] [d/l] 
Puzzle for the puzzle: Re: Puzzle...
by Ovid (Cardinal) on Jul 09, 2002 at 16:49 UTC

I was sent the following answer to the puzzle with the following challenge:
 Why does this work
 How efficient is is. i.e., how fast it
calculates group_cats(50, 1..150)  with the caveat that it might not be that efficient in memory.
If you think you know who sent this, please do not speculate! I will neither confirm nor deny ...
use strict;
use Carp;
use Data::Dumper;
$Data::Dumper::Indent = 1;
print Dumper [group_cats(3, 1..7)];
{
my @sizes;
my %ans;
sub group_cats {
(my $num, @sizes) = @_;
%ans = ();
my ($size, @partition) = _group_cats($num, 0, $#sizes);
return @partition;
}
sub _group_cats {
my $key = join ":", @_;
my ($num, $start, $end) = @_;
if (not exists $ans{$key}) {
if ($num < 1) {
$ans{$key} = [0];
}
elsif (1 == $num) {
my @part = map $sizes[$_], $start..$end;
$ans{$key} = [sum(@part), \@part];
}
else {
my $num_a = int($num/2);
my $num_b = $num  $num_a;
my $min_mid = $start + $num_a  1;
my $max_mid = $end  $num_b;
my $mid = int(($min_mid + $max_mid)/2);
my ($last_a, @part_a) = _group_cats($num_a, $start, $mid);
my ($last_b, @part_b) = _group_cats($num_b, $mid + 1, $end);
my $best = max($last_a, $last_b);
my @best_part = (@part_a, @part_b);
while ($min_mid < $max_mid) {
if ($last_a <= $last_b) {
if ($min_mid < $mid) {
$min_mid = $mid;
}
else {
$min_mid = $mid + 1;
}
}
else {
$max_mid = $mid;
}
$mid = int(($min_mid + $max_mid)/2);
($last_a, @part_a) = _group_cats($num_a, $start, $mid);
($last_b, @part_b) = _group_cats($num_b, $mid + 1, $end);
if (max($last_a, $last_b) < $best) {
$best = max($last_a, $last_b);
@best_part = (@part_a, @part_b);
}
}
$ans{$key} = [$best, @best_part];
}
}
return @{$ans{$key}};
}
}
sub max {
my $max = shift;
for (@_) {
$max = $_ if $max < $_;
}
$max;
}
sub sum {
my $sum = 0;
$sum += $_ for @_;
return $sum;
}
Cheers,
Ovid
Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.  [reply] [d/l] 

use strict;
{
use integer;
my @sizes;
my %ans;
my @align;
sub group_cats {
(my $num, @sizes) = @_;
%ans = ();
my $a = 1;
for my $i (1..$#sizes) {
if ($a + $a < $i) {
$a += $a;
}
$align[$i] = $a;
}
$align[0] = 0;
my ($size, @partition) = _group_cats($num, 0, $#sizes);
return @partition;
}
sub _group_cats {
my $key = join ":", @_;
my ($num, $start, $end) = @_;
if (not exists $ans{$key}) {
if ($num < 1) {
$ans{$key} = [0];
}
elsif (1 == $num) {
my @part = @sizes[$start..$end];
my $sum = 0;
$sum += $_ for @part;
$ans{$key} = [$sum, \@part];
}
else {
my $num_a = $align[$num];
my $num_b = $num  $num_a;
my $min_mid = $start + $num_a  1;
my $max_mid = $end  $num_b;
my $mid = $min_mid + $align[$max_mid  $min_mid];
my ($last_a, @part_a) = _group_cats($num_a, $start, $mid);
my ($last_b, @part_b) = _group_cats($num_b, $mid + 1, $end);
my $best = $last_a < $last_b ? $last_b : $last_a;
my @best_part = (@part_a, @part_b);
while ($min_mid < $max_mid) {
if ($last_b <= $last_a) {
if ($max_mid > $mid) {
$max_mid = $mid;
}
else {
$max_mid;
}
}
else {
$min_mid = $mid;
}
$mid = $min_mid + $align[$max_mid  $min_mid];
($last_a, @part_a) = _group_cats($num_a, $start, $mid);
($last_b, @part_b) = _group_cats($num_b, $mid + 1, $end);
my $max = $last_a < $last_b ? $last_b : $last_a;
if ($max < $best) {
$best = $max;
@best_part = (@part_a, @part_b);
}
}
$ans{$key} = [$best, @best_part];
}
}
return @{$ans{$key}};
}
}
BTW the original question had an straightforward recursive solution that scaled exponentially. I found that too easy, hence the faster solution.  [reply] [d/l] 
Re: Puzzle: need a more general algorithm
by runrig (Abbot) on Jul 09, 2002 at 01:42 UTC

I figure you'll always combine the smallest category with the one before or after it (Update: realized that in general, this is not a valid assumption; finding example left as excercise, though maybe finding smallest consecutive two column sum, then generating possibilities summing of columns before and after that might work...), so here's my fairly (in)efficient (O(N**2)) answer: #!/usr/bin/perl
use strict;
use warnings;
my $num_columns = 4;
my @cat = (15, 15, 10, 10, 15, 15);
my @ans = squish(@cat);
print "@ans\n";
sub squish {
my @arr = @_;
my @aoa = (\@arr);
for ($num_columns..(@arr1)) {
my @tmp_aoa;
push @tmp_aoa, squisher(@$_) for @aoa;
@aoa = @tmp_aoa;
}
my $best_ans;
my $best_value;
for my $aref (@aoa) {
my $max_value;
for my $value (@$aref) {
$max_value = $value if !defined $max_value or $value > $max_valu
+e;
}
$best_ans = $aref, $best_value = $max_value
if !defined $best_value or $max_value < $best_value;
}
@$best_ans;
}
sub squisher {
my @arr = @_;
my $min_col;
my $min_value;
for (0..$#arr) {
$min_value = $arr[$_], $min_col = $_
if !defined $min_value or $arr[$_] < $min_value;
}
my @arr1 = ($min_col > 0) ? @arr : ();
my @arr2 = ($min_col < $#arr) ? @arr : ();
splice(@arr1, $min_col1, 2, $arr1[$min_col1] + $arr1[$min_col])
if @arr1;
splice(@arr2, $min_col, 2, $arr2[$min_col] + $arr2[$min_col+1])
if @arr2;
return ((@arr1 ? \@arr1 : ()), (@arr2 ? \@arr2 : ()));
}
Update: mildly tested...  [reply] [d/l] 
Re: Puzzle: need a more general algorithm
by Aristotle (Chancellor) on Jul 09, 2002 at 13:18 UTC

Sounds like a case for Branch and Bound.
I would partition the problem space by adding the next free category to either last used column or the next empty column. The root problem would be "category 1 in column 1", so the children would "category 2 in column 1" and "category 2 in column 2", etc. The low boundary for each solution is the height of the highest column, and the solution's high boundary is maximum(sum of heights of free categories, lower boundary). There is a constraint (free categories > empty columns) that has to be fulfilled to assure there will not be empty columns.
It's probably best to generate the solution tree breadthfirst as you will descend far down the left side very quickly if the global lower and upper boundaries have not been narrowed down. Breadthfirst would garantuee that you limit them soon. On the other hand, for larger problems the breadthfirst approach will also require more memory to hold the solution tree.
I tried to write code yesterday but it was 3am and my mind wouldn't cooperate. Now I've just gotten up way waay too late into the day and it still doesn't. When I'm feeling a bit fresher I will update here with some code (or capitulation *g*). The tricky problem is coming up with a convenient data structure  I tried anonymous arrays for the columns, but it's inconvenient to have to deep copy to generate the child problems.
Update: sorry, would take too much time right now. :( I will have a spare evening in a few days and may I'll get back to it then. Sigh..
Makeshifts last the longest.
 [reply] 
Re: Puzzle: need a more general algorithm
by dws (Chancellor) on Jul 10, 2002 at 06:48 UTC

I've been chewing on this problem for a day now. This is clearly problem with two logical parts. Part 1 is to generate all legal mappings of N columns of data into M buckets, given the constraints that no bucket can be empty, and that the columns need to stay ordered. The second part is apply these mappings to the input data, and select a mapping that yields a "best fit."
I focused on the first part, looking for a quicker, simpler solution. I think I have one. Here it is. Given a number of columns and a number of buckets, the code below calculates all legal mappings of columns to buckets, and returns these in a hash, where the key is a printable string, and the value is an anonymous array.
{
# map columns to buckets. key is string, value is anonymous array.
my %c2bMap;
sub c2bMappings {
my($buckets, $columns) = @_;
die "bogus args" unless $buckets > 1 && $columns > $buckets;
%c2bMap = ();
_genFrom(0, (0) x ($columns  $buckets), 1 .. ($buckets  1));
return \%c2bMap;
}
sub _genFrom {
my @c2bMap = @_;
return if exists $c2bMap{"@c2bMap"};
print "@c2bMap\n"; #DEBUG
$c2bMap{"@c2bMap"} = \@c2bMap;
foreach my $i ( 2 .. $#c2bMap ) {
my $n = $c2bMap[$i]  1;
if ( $c2bMap[$i  2] == $n && $c2bMap[$i  1] == $n ) {
local $c2bMap[$i1] = $c2bMap[$i];
_genFrom(@c2bMap);
}
}
}
}
c2bMappings(4,6);
__END__
0 0 0 1 2 3
0 0 1 1 2 3
0 1 1 1 2 3
0 1 1 2 2 3
0 1 2 2 2 3
0 1 2 2 3 3
0 1 2 3 3 3
0 1 1 2 3 3
0 0 1 2 2 3
0 0 1 2 3 3
Recursion only happens with valid mappings.
Note the selective localization of an element of the array that the code is about to recurse on.
 [reply] [d/l] 
Re: Puzzle: need a more general algorithm
by FoxtrotUniform (Prior) on Jul 09, 2002 at 02:56 UTC

Note: I'm thinking this up as I go along, so take
it with an appropriately sized grain of salt.
Once you've got your data in an appropriate arrangement
(for example, the position Ovid gives us:
1 3 4 5
2 6
You can perform at most two operations on each bucket:
 Shift the "bottom" element one stack to the left
 Shift the "top" element one stack to the right
In this case, there are only two legal operations:
 Shift element 2 to the second column
 Shift element 5 to the third column
My first idea was another greedy approach: start with the
data in a valid (though almost certainly not optimal) state,
and perform the operation that gives the most benefit. (In
other words, hill climbing.) Since our score is
the longest column we have so far, we're only going to climb
hills (lower our score, which in this case is good) by
operating on the longest column. Unfortunately, this leads
us to local maxima, not global ones. Note: I haven't
proven this, it's just something that tends to be true of
hill climbing algorithms. I'm a bit hopeful about this
because the domain is finite, and you might get decent
results by picking a few random starting points, hill
climbing on each of them for a small number of iterations,
and picking the best one, but on the other hand....

The hell with paco, vote for Erudil!
:wq  [reply] [d/l] 
Re: Puzzle: need a more general algorithm
by fglock (Vicar) on Jul 09, 2002 at 12:10 UTC

use strict;
my @height = qw/ 10 15 25 30 10 13 /;
my $columns = 3;
my @Best = ();
my $Min_height = 1E9;
&check_columns ( [ 0 .. $#height ] );
print "best result: \n";
# print join(" ", map { join('#', @$_) } @Best), "\n";
foreach my $y (0..$Table_height) {
foreach my $x (0..$columns) {
my $data = ${$Best[$x]}[$y];
print defined $data ? $data : " ";
print " ";
}
print "\n";
}
# check column height combinations
sub check_columns {
my $pcat = shift;
foreach my $p (0 .. $#{@$pcat}  1) {
my @result = ( [ @$pcat[0 .. $p] ], [ @$pcat[$p + 1 .. $#{@$pc
+at}] ], @_);
if ($#result == $columns) {
# @result is an array of arrays
# print join(" ", map { join('#', @$_) } @result), "\n";
my $max_height = 0;
foreach my $j (@result) {
my $height = 0;
# foreach my $i (@$j) { print "$i:$height[$i] "; }; pr
+int "\n";
foreach my $i (@$j) { $height += $height[$i] };
$max_height = $height if $height > $max_height;
}
# print "max_height: $max_height\n";
if ($max_height < $Min_height) {
$Min_height = $max_height;
@Best = @result;
}
}
else {
check_columns(@result);
}
}
}
It prints:
best result:
0 2 3 4
1 5
You can uncomment the print lines to check
how it works.
@Best is an arrayofarrays.
Update: show table.
 [reply] [d/l] [select] 

best result:
0 3 4
1 5
2
$columns = 4 gives:
best result:
0 1 2 3 4
5
 [reply] [d/l] [select] 
Re: Puzzle: need a more general algorithm
by lemming (Priest) on Jul 10, 2002 at 04:47 UTC

Ok, this is that straight forward recursive method. By keeping track of the subsets we've already looked at, it's not too bad with the 50, 1150 set.
Update: Pass data by reference as suggested. A couple other small changes. Added another indice for array totals.
#!/usr/bin/perl
use strict;
use warnings;
use constant DEBUG => 0;
# my $columns = 6;
# my @data = qw( 10 13 25 30 10 15 1 4 25);
my $columns = 50;
my @data = (1..150);
die "Not enough catagories for columns\n"
if $columns > @data;
our $mega_height = sum(\@data);
my %key_h; my %add_h;
my $best_r = get_best($columns, \%add_h, \%key_h, \@data);
printit($best_r);
exit;
sub get_best {
my ($columns, $add_r, $key_r, $data_r) = @_;
my $max_stack = @$data_r  $columns + 1;
my $max_height = $mega_height;
my $fed_key = join("", @$data_r, $columns);
print "[", join(",", @$data_r),"]",$columns,"\n" if DEBUG;
my $best_r;
foreach my $stack ( 1 .. $max_stack ) {
my @arr;
$arr[0] = [ @$data_r[0..$stack1] ];
my $tmp_r;
if ($columns == 2) {
# We only have one more column to fill
push(@arr, [ @$data_r[$stack..@$data_r1] ]);
}
elsif (@$data_r  $stack == $columns  1 ) {
# One cat per column left
map push(@arr, [ $_ ]), @$data_r[$stack..@$data_r1];
}
else {
my $key = join("", @$data_r[$stack..@$data_r1],
$columns  1);
# See if we've done this before
if ( defined( $key_r>{$key} )) {
$tmp_r = $key_r>{$key};
}
else {
$tmp_r = get_best( $columns  1, $add_r, $key_r,
[@$data_r[$stack..@$data_r1]] );
}
push ( @arr, @$tmp_r );
}
my $cur_height = 0;
foreach my $col_r (@arr ) {
my $height;
my $ckey = join(",",@$col_r);
if ( defined( $add_r>{$ckey} )) {
$height = $add_r>{$ckey};
}
else {
$height = sum($col_r);
$add_r>{$ckey} = $height;
}
$cur_height = $height if $cur_height < $height;
}
printit(\@arr) if DEBUG;
if ( $cur_height < $max_height or !defined($best_r)) {
$best_r = \@arr;
$max_height = $cur_height;
}
}
$key_r>{$fed_key} = $best_r;
return $best_r;
}
sub sum {
my ($col_r) = @_;
my $height = 0;
foreach my $bit ( @$col_r ) {
$height += $bit;
}
return $height;
}
sub printit {
my ($arr_r) = @_;
my $start = 1;
my $max = 0;
foreach my $col_r (@$arr_r) {
print ", " unless $start;
$start = 0 if ( $start );
my $height = sum($col_r);
print "[ ", join(", ", @$col_r), " ]";
$max = $height if $height > $max;
}
print " => $max\n";
}
 [reply] [d/l] 

Very good. Prememoization you are exponential. With memoization your memory usage scales like O(n**3), and your performance is O(n**4). If you passed the array by reference, and added 2 indices, you would drop a factor of n off of both.
By contrast the efficient algorithm is subexponential prememoization, and is subquadratic after.
Which shows that good algorithms are a big win, but being straightforward, and then applying wellknown speedups to that, can still result in a usable algorithm.
 [reply] 
Re: Puzzle: need a more general algorithm
by fglock (Vicar) on Jul 09, 2002 at 17:52 UTC

use strict;
my @height = qw/ 10 15 25 30 10 13 /;
my $columns = 3;
my @Best = ();
my $Min_height = 1E9;
my $Table_height = 0;
&check_columns;
foreach my $y (0..$Table_height) {
foreach my $x (0..$columns) {
my $data = ${$Best[$x]}[$y];
print defined $data ? $data : " ";
print " ";
}
print "\n";
}
sub check_columns {
my $pcat = ( shift or [ 0 .. $#height ] );
my @a = @$pcat;
my @b = ();
map {
push @b, shift @a;
my @result = ( [@a], [@b], @_);
if ($#result == $columns) {
my $max_height = 0;
my $table_height = 0;
foreach my $j (@result) {
my $height = eval join '+' => @height[@$j];
$max_height = $height if $height > $max_height;
$table_height = $#{@$j} if $table_height < $#{@$j};
}
if ($max_height < $Min_height) {
$Min_height = $max_height;
@Best = @result;
$Table_height = $table_height;
}
}
else {
check_columns(@result);
}
} @a;
}
Update: I should have used for (@a) instead of map.
 [reply] [d/l] [select] 

This is definitely not optimized.
More than 20 minutes have passed and
(50, 1..150) didn't finish yet...
Update: got Out of memory
 [reply] [d/l] 
Re: Puzzle: need a more general algorithm
by zaimoni (Beadle) on Jul 09, 2002 at 09:55 UTC

Thinking vaguely...
Notation:
 Columns: N
 Category count: Q
Assuming the problem makes sense (Q>N), yet another way to think about the problem space is to enumerate the cells like this:
1 3 5 7 2 4 6 8
Then, if we index the categories by 0...Q1 (fine because the category order must be preserved), we can consider the location of a category to be L(category index).
The nature of this location is yet to be determined.
I see that the categories always fill up from the top down.
Now, implementation details are critical...so I may be off on a wild goose chase. I'm envisioning this target (X)HTML table as a single row with a single cell for each column. Then the topdown ordering is automatic, and we're just calculating where to put the table cell delimiters. In this case, the location L corresponds simply to the column, and I can forget about rows entirely for the analysis. Then
L(0)<=L(1)<=...<=L(Q2)<=L(Q1)
In particular, L(0)=1 and L(Q1)=N. In some way, we need to track the mininum category index for each column 2...N.
Merlyn pointed out the above, in his comment about binary representation.
Untested pseudocode follows:
my $ColumnCount = 4; # or whatever we need it to be
my $CategoryCount = 6; # or whatever we need it to be
my %Weight = ();
# TODO: initialize %Weight hash with values from 0 to Q1
my %WeightCache = (); # to be filled with weights
# TODO: initialize weight cache to handle references to
# hashes with indexes 1..Q1
# bruteforce would iterate across combinations of 1...Q1;
# for 6 categories and 4 columns, this is 5 choose 3 i.e. 10
sub WeightSpan
{
my ($LowBound,$HighBound) = @_;
my $WeightSum = 0;
$WeightSum += $Weight{$LowBound++}
while $LowBound<$HighBound;
return $WeightSum;
}
sub IncrementCombination
{
my ($LowBound,$HighBound,@Combination) = @_;
my $Idx = 0;
while($Combination[$Idx1]=$HighBound$Idx)
{
return () if 1==$Idx+scalar @Combination;
$Idx;
};
$Combination[$Idx1]++;
$Combination[++$Idx 1]++ while $Idx;
return @Combination;
}
sub MaximumHeight
{
my ($LowBound,$HighBound,@Combination) = @_;
my @IndexSpan = ();
# TODO: autoinit @IndexSpan: 0...scalar @Combination1
return max(map
{$WeightCache{$Combination[$_]}{$Combination[$_+1]}
= &WeightSpan($Combination[$_],$Combination[$_+1])}
@IndexSpan);
}
sub CrunchBestCombination
{
my ($LowBound,$HighBound,$N) = @_;
my @Combination = ();
# TODO: autoinit: fill @Combination with 1...$N1
my @BestCombination = @LowestBounds;
my $BestHeight = &MaximumHeight($LowBound, $HighBound,
@Combination);
&IncrementCombination($LowBound, $HighBound,
@Combination);
while(@Combination)
{
my $CurrentMaxHeight =
&MaximumHeight($LowBound,$HighBound,@Combination);
if ($CurrentMaxHeight<$BestHeight)
{
$BestHeight = $CurrentMaxHeight;
@BestCombination = @Combination;
}
@Combination = &IncrementCombination($LowBound,
$HighBound, @Combination);
}
return ($BestHeight,@BestCombination);
}
my @TargetCombination = &CrunchBestCombination(1,
$ColumnCount1, $CategoryCount);
my $BestHeight = shift(@TargetCombination);
# TODO: translate @TargetCombination into </td><td>
# cell breaks
Obviously, all TODO comments should be implemented before the code has any chance of working. Also, a number of the example functions can be made more succinct.  [reply] [d/l] [select] 
•Re: Puzzle: need a more general algorithm
by merlyn (Sage) on Jul 12, 2002 at 00:07 UTC

 [reply] 
Re: Puzzle: need a more general algorithm
by fglock (Vicar) on Jul 16, 2002 at 13:41 UTC

# find subgroups with *almost* optimal sum distribution.
# performance is roughly linear on N and Columns.
use strict;
my $benchmark;
sub splitter {
my ($columns, $plist) = @_;
return $plist unless $columns;
my $sum1 = 0; my $sum2 = eval join "+" => @$plist;
my @tmp1 = (); my @tmp2 = @$plist;
my @best = (1e9, undef, undef);
while ($#tmp2 > 0) {
$benchmark++;
push @tmp1, shift @tmp2;
$sum1 += $tmp1[1];
my $mean_ratio = $sum1 * $columns / ($sum2  $sum1);
$mean_ratio = 1 / $mean_ratio if $mean_ratio < 1;
last unless $best[0] > $mean_ratio;
@best = ($mean_ratio, [@tmp1], [@tmp2]);
}
return $plist if $best[0] == 1e9; # underflow
return $best[1],
$columns > 0 ? splitter($columns, $best[2]) : $best[2];
}
my @list = (10, 15, 25, 30, 10, 13);
for my $columns (3, 4, 5) {
$benchmark = 0;
print "[", map(" [@$_]", splitter($columns, \@list ) ), " ]\n\n";
print "Inner loop = $benchmark\n\n";
}
my @list = (1..150);
for my $columns (10, 50, 80, 100) {
$benchmark = 0;
print "[", map(" [@$_]", splitter($columns, \@list ) ), " ]\n\n";
print "Inner loop = $benchmark\n\n";
}
Although this does not always give an
exact solution, it has some advantages:
it is pretty fast
it is not memoryhungry
it works fine with random numbers
time is linearly proportional to number of columns and
to list size
I think it might be possible to find a better
solution in time O(N * columns * log(N * columns)),
by making the inner loop recursive.
update: it might be easy to rewrite
"splitter" to don't use recursion, since it
calls itself from outside the while loop
 [reply] [d/l] 

 [reply] 

