in reply to Computer science Problem - single dimension bin packing

What are you trying to optimize? Are you trying to get an even distribution of data across a fixed number of drives? Efficiently fill drives of fixed size? The latter is (as AppleFritter points out) the knapsack problem.

For case 1), a simple and quite effective solution is to go from largest object to smallest, putting it in the least filled bin.

#!/usr/bin/perl use warnings; use strict; use Data::Dumper; my %bins = ( A => 0, B => 0, C => 0, D => 0, ); my %dir; my $int; while(<DATA>) { chomp; my ($size, $name) = split /\s+/; $name .= ++$int if $dir{$name}; # Disabiguation; all your names ar +e identical $dir{$name} = $size; } for my $name (sort {$dir{$b} <=> $dir{$a}} keys %dir) { my ($smallest) = sort {$bins{$a} <=> $bins{$b}} keys %bins; $bins{$smallest} += $dir{$name}; } print Dumper(\%bins);
output:
$VAR1 = { 'A' => '9885811019', 'D' => '9885704251', 'C' => '9885960464', 'B' => '9885704533' };

For case 2), you would put it in the smallest available space in which it fits.

#!/usr/bin/perl use warnings; use strict; use Data::Dumper; my $max = 10**10; my $new_bin = 'A'; my %bins; my %dir; my $int; while(<DATA>) { chomp; my ($size, $name) = split /\s+/; $name .= ++$int if $dir{$name}; # Disabiguation; all your names ar +e identical $dir{$name} = $size; } DIR_LOOP: for my $name (sort {$dir{$b} <=> $dir{$a}} keys %dir) { for my $bin_name (sort {$bins{$b} <=> $bins{$a}} keys %bins) { if ($bins{$bin_name} + $dir{$name} < $max) { $bins{$bin_name} += $dir{$name}; next DIR_LOOP; } } $bins{$new_bin++} = $dir{$name}; } print Dumper(\%bins);
outputs
$VAR1 = { 'A' => '9999831421', 'D' => '9543689031', 'C' => '9999660101', 'B' => '9999999714' };

Update: Added code


#11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Replies are listed 'Best First'.
Re^2: Computer science Problem - single dimension bin packing
by ikegami (Patriarch) on Aug 14, 2014 at 17:08 UTC

    What are you trying to optimize?

    Neither. It's one of

    • If he has an unlimited number of tapes: Minimize the number of tapes needed (bin packing problem).
    • If he has an limited number of tapes: Maximize the value of what will fit on the tapes he has (multiple knapsack problem).

    Efficiently fill drives of fixed size? The latter is (as AppleFritter points out) the knapsack problem.

    No, it's not, as I already pointed out. In the knapsack problem, you have one bin (tape), and you're trying the maximize what you can put on it. Stuff is left behind.

      Stuff is left behind.

      Then add another bin (tape) and start filling it with leftovers. Repeat until no more bins (tapes) or leftovers.

      The first few bins would probably be most optimally filled for most datasets but if the goal is to use the least amount of tapes this should work... or am I missing something?

      -- FloydATC

      Time flies when you don't know what you're doing

        Then add another bin (tape) and start filling it with leftovers. Repeat until no more bins (tapes) or leftovers.

        That wouldn't optimize anything. I think we all agree that there is something that he wants to optimize.

      Except that the original problem description does not include the size of the tapes, which means it doesn't clearly map to either of your suggestions. Thus my attempt to gather more data.


      #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Re^2: Computer science Problem - single dimension bin packing
by davis (Vicar) on Aug 15, 2014 at 12:11 UTC

    Hi kennethk. First of all, sorry for not stating my intention concisely. I thought I had, but on re-reading I realise it's not clear.
    However, as you correctly inferred, I'd like to (approximately) evenly distribute the data across a fixed number of drives (4, in this case). It looks like your first chunk of code does this, and produces far better results than my first attempt at placing the largest items first.

    Embarrassingly, I've just looked at my code (which I didn't post) and realised that I in fact sorted in the wrong order, and it placed the smallest item first. The second lesson I should learn is that I should at least include the code that was faulty, as someone would've spotted it instantly. In any case I think I prefer your code slightly to my (now working) code, so I'll use that.

    Thanks everyone (and sorry), as always some interesting responses, although I'm so far behind on the CS terms!


    davis

      Coincidentally, I got the same result when I first tested my code. Since I knew you'd already done better with random ordering, I found my bug quickly.


      #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.