Re: Computer science Problem - single dimension bin packing

What are you trying to optimize? Are you trying to get an even distribution of data across a fixed number of drives? Efficiently fill drives of fixed size? The latter is (as AppleFritter points out) the knapsack problem.

For case 1), a simple and quite effective solution is to go from largest object to smallest, putting it in the least filled bin.

#!/usr/bin/perl

use warnings;
use strict;
use Data::Dumper;

my %bins = (
    A => 0,
    B => 0,
    C => 0,
    D => 0,
);

my %dir;
my $int;
while(<DATA>) {
    chomp;
    my ($size, $name) = split /\s+/;
    $name .= ++$int if $dir{$name}; # Disabiguation; all your names ar
+e identical
    $dir{$name} = $size;
}

for my $name (sort {$dir{$b} <=> $dir{$a}} keys %dir) {
    my ($smallest) = sort {$bins{$a} <=> $bins{$b}} keys %bins;
    $bins{$smallest} += $dir{$name};
}

print Dumper(\%bins);
[download]

output:

$VAR1 = {
          'A' => '9885811019',
          'D' => '9885704251',
          'C' => '9885960464',
          'B' => '9885704533'
        };
[download]

For case 2), you would put it in the smallest available space in which it fits.

#!/usr/bin/perl

use warnings;
use strict;
use Data::Dumper;

my $max = 10**10;
my $new_bin = 'A';
my %bins;

my %dir;
my $int;
while(<DATA>) {
    chomp;
    my ($size, $name) = split /\s+/;
    $name .= ++$int if $dir{$name}; # Disabiguation; all your names ar
+e identical
    $dir{$name} = $size;
}

DIR_LOOP: for my $name (sort {$dir{$b} <=> $dir{$a}} keys %dir) {
    for my $bin_name (sort {$bins{$b} <=> $bins{$a}} keys %bins) {
        if ($bins{$bin_name} + $dir{$name} < $max) {
            $bins{$bin_name} += $dir{$name};
            next DIR_LOOP;
        }
    }
    $bins{$new_bin++} = $dir{$name};
}

print Dumper(\%bins);
[download]

outputs

$VAR1 = {
          'A' => '9999831421',
          'D' => '9543689031',
          'C' => '9999660101',
          'B' => '9999999714'
        };
[download]

Update: Added code

#11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Comment on Re: Computer science Problem - single dimension bin packing Select or Download Code

Replies are listed 'Best First'.
Re^2: Computer science Problem - single dimension bin packing by ikegami (Patriarch) on Aug 14, 2014 at 17:08 UTC
What are you trying to optimize? Neither. It's one of If he has an unlimited number of tapes: Minimize the number of tapes needed (bin packing problem). If he has an limited number of tapes: Maximize the value of what will fit on the tapes he has (multiple knapsack problem). Efficiently fill drives of fixed size? The latter is (as AppleFritter points out) the knapsack problem. No, it's not, as I already pointed out. In the knapsack problem, you have one bin (tape), and you're trying the maximize what you can put on it. Stuff is left behind.	[reply]
Re^3: Computer science Problem - single dimension bin packing by FloydATC (Deacon) on Aug 14, 2014 at 18:29 UTC
Stuff is left behind. Then add another bin (tape) and start filling it with leftovers. Repeat until no more bins (tapes) or leftovers. The first few bins would probably be most optimally filled for most datasets but if the goal is to use the least amount of tapes this should work... or am I missing something? -- FloydATC Time flies when you don't know what you're doing	[reply]
Re^4: Computer science Problem - single dimension bin packing by ikegami (Patriarch) on Aug 14, 2014 at 19:18 UTC
Then add another bin (tape) and start filling it with leftovers. Repeat until no more bins (tapes) or leftovers. That wouldn't optimize anything. I think we all agree that there is something that he wants to optimize.	[reply]
Re^5: Computer science Problem - single dimension bin packing by FloydATC (Deacon) on Aug 14, 2014 at 20:13 UTC
Re^3: Computer science Problem - single dimension bin packing by kennethk (Abbot) on Aug 14, 2014 at 18:03 UTC
Except that the original problem description does not include the size of the tapes, which means it doesn't clearly map to either of your suggestions. Thus my attempt to gather more data. #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.	[reply]
Re^2: Computer science Problem - single dimension bin packing by davis (Vicar) on Aug 15, 2014 at 12:11 UTC
Hi kennethk. First of all, sorry for not stating my intention concisely. I thought I had, but on re-reading I realise it's not clear. However, as you correctly inferred, I'd like to (approximately) evenly distribute the data across a fixed number of drives (4, in this case). It looks like your first chunk of code does this, and produces far better results than my first attempt at placing the largest items first. Embarrassingly, I've just looked at my code (which I didn't post) and realised that I in fact sorted in the wrong order, and it placed the smallest item first. The second lesson I should learn is that I should at least include the code that was faulty, as someone would've spotted it instantly. In any case I think I prefer your code slightly to my (now working) code, so I'll use that. Thanks everyone (and sorry), as always some interesting responses, although I'm so far behind on the CS terms! davis	[reply]
Re^3: Computer science Problem - single dimension bin packing by kennethk (Abbot) on Aug 15, 2014 at 14:45 UTC
Coincidentally, I got the same result when I first tested my code. Since I knew you'd already done better with random ordering, I found my bug quickly. #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.	[reply]