jkhunt has asked for the wisdom of the Perl Monks concerning the following question:

I am currently trying to place packages in a directory with a 4GB limit. Once it hits its limit it will create a new directory.

The problem I am having is that I don't know how to check my previous directories for any extra space I may have to place additional packages. This is what I am trying to do:

pkgA => 2GB pkgB => 1.5GB pkgC => 1GB pkgD => .75GB pkgE => .5GB
In my current algorithm, pkg A and pkg B are placed on Directory 1 equaling 3.5 GB. When I go to pkg C, it goes over the limit of 4GB and creates directory 2.

I am not sure how to get package D to compare directory 1 and directory 2. pkg D would go on directory 2, however pkg E would fit on directory one.

Can someone help?

Replies are listed 'Best First'.
Re: Help with Space Efficency algorithim
by MidLifeXis (Monsignor) on Dec 27, 2007 at 18:36 UTC

    Look up the knapsack problem. This is one of the classical solutions to this NP Complete problem.

    Update: Also see the Bin Packing Problem, which is probably more appropriate for your problem.

    Update-II(can you tell I enjoy algorithms? :-): Usually sorting the data in decending order of size, and putting the largest item from your list that is smaller than the space left in your "bin" into the bin (repeat as necessary) gives "good enough" results.

    --MidLifeXis

      MidLifeXis,
      Your advice may not help jkhunt. It sounds like it is possible that the packages are being placed over time and are not all available up front for packing. That makes the question:

      How can I tell how much free space is used in a list of directories?

      use Cwd; use constant LIMIT => 4 * 1024 * 1024 * 1000; my @dirs; # perhaps found with File::Find for my $dir (@dirs) { my $avail = LIMIT - dir_size($dir); print "'$dir' has $avail free\n"; } sub dir_size { my ($dir) = @_; my $orig_dir = getcwd; chdir '$dir' or die "Unable to chdir to '$dir': $!"; open(my $dh, '.') or die "Unable to open a dir handle for '$dir': +$!"; my $size; while (<$dh>) { next if ! -f $_; $size += -s $_; # could use _ shortcut } chdir $orig_dir or die "Unable to restore dir to '$orig_dir': $!"; return $size; }

      If, on the other hand, the packages are all available up front - I too would recommend a "good enough" algorithm rather than finding the perfect solution.

      Cheers - L~R

        I would agree with your assessment. If the data is known ahead of time, then you can use the standard bin packing solutions.

        If you are getting the new data over time, perhaps adding the new data to the directory with the smallest amount of space free larger than the size of the new data may be appropriate. I think that this would give the best packing over time.

        @spaces = map {{dir => $_, size => dir_size($_)}} foreach (@dirs); $small_space = (sort by_dir_size grep larger_than_needed @spaces)[0]; store_in_space($newdata, $small_space);

        code is pseudo code, but almost perl

        --MidLifeXis

        Thank you MidLifeXis, but all my packages are available. I am moving them from one location to another for space efficiency. The bin packing solution seems like it is the best for my issue. Does anyone have a perl example of the bin packing solution? I have a hash with pkg A through E with the size. Can someone help?
Re: Help with Space Efficency algorithim
by afresh1 (Hermit) on Dec 28, 2007 at 23:00 UTC
    I had good luck fitting stuff on DVDs using Algorithm::Bucketizer. It did the work so my script was short.

    l8rZ,
    --
    andrew
Re: Help with Space Efficency algorithim
by LighthouseJ (Sexton) on Dec 28, 2007 at 13:04 UTC
    I looked at this independently in the perspective of fitting files onto a DVD. I ended up programmatically brute-forcing it (which takes a long time of course because it has to check all possible cases) but it ended up with the best results possible. Brute force might not be bad since you have only about 5 things, so that's only 5! (I think) cases to try.
    Sure, algorithms like ordering the sizes or randomly placing things together would save time but there's no shortcut to the best result.
    "The three principal virtues of a programmer are Laziness, Impatience, and Hubris. See the Camel Book for why." -- `man perl`
      > there's no shortcut to the best result.

      Laziness -- brute force (exhaustive combination) is perhaps easiest to code (although first fit decreasing is pretty easy)

      Impatience -- exhaustive combination dies a horrible flaming death somewhere in the high teens, unless your computer is a lot faster than mine. First Fit Decreasing needs a sort, but that isn't a big deal until you get into too many DVDs to want to hand process.

      Hubris -- exhaustive combination will achieve the optimal result, but first fit decreasing will come within 11% of it. What price glory?

        heh. well, as I see it, laziness applies to execution, not ease of programming. Impatience is more of a direct result of taking shortcuts, i.e. utilizing a sorting algorithm. I'm not looking for glory, just the best answer.

        I don't know about you but I don't get the luxury of living 11% away from correct.

        "The three principal virtues of a programmer are Laziness, Impatience, and Hubris. See the Camel Book for why." -- `man perl`