in reply to Help with Space Efficency algorithim

Look up the knapsack problem. This is one of the classical solutions to this NP Complete problem.

Update: Also see the Bin Packing Problem, which is probably more appropriate for your problem.

Update-II(can you tell I enjoy algorithms? :-): Usually sorting the data in decending order of size, and putting the largest item from your list that is smaller than the space left in your "bin" into the bin (repeat as necessary) gives "good enough" results.

--MidLifeXis

  • Comment on Re: Help with Space Efficency algorithim

Replies are listed 'Best First'.
Re^2: Help with Space Efficency algorithim
by Limbic~Region (Chancellor) on Dec 27, 2007 at 19:38 UTC
    MidLifeXis,
    Your advice may not help jkhunt. It sounds like it is possible that the packages are being placed over time and are not all available up front for packing. That makes the question:

    How can I tell how much free space is used in a list of directories?

    use Cwd; use constant LIMIT => 4 * 1024 * 1024 * 1000; my @dirs; # perhaps found with File::Find for my $dir (@dirs) { my $avail = LIMIT - dir_size($dir); print "'$dir' has $avail free\n"; } sub dir_size { my ($dir) = @_; my $orig_dir = getcwd; chdir '$dir' or die "Unable to chdir to '$dir': $!"; open(my $dh, '.') or die "Unable to open a dir handle for '$dir': +$!"; my $size; while (<$dh>) { next if ! -f $_; $size += -s $_; # could use _ shortcut } chdir $orig_dir or die "Unable to restore dir to '$orig_dir': $!"; return $size; }

    If, on the other hand, the packages are all available up front - I too would recommend a "good enough" algorithm rather than finding the perfect solution.

    Cheers - L~R

      I would agree with your assessment. If the data is known ahead of time, then you can use the standard bin packing solutions.

      If you are getting the new data over time, perhaps adding the new data to the directory with the smallest amount of space free larger than the size of the new data may be appropriate. I think that this would give the best packing over time.

      @spaces = map {{dir => $_, size => dir_size($_)}} foreach (@dirs); $small_space = (sort by_dir_size grep larger_than_needed @spaces)[0]; store_in_space($newdata, $small_space);

      code is pseudo code, but almost perl

      --MidLifeXis

        In a way, your way works nicely. It's really easy to comprehend, which makes it appealing from a maintenance perspective. However, your solution is guaranteed to create relatively significant fragmentation if your average filesize > (binsize / 3). (Not proven, but seems about right). If you have a larger average filesize, your best bet to avoid fragmentation is to pick the largest free space that will fit. That will reduce the fragmentation you will encounter.

        Another optimization is to use something like DBM::Deep to store the bins and how much space is left in them. That way, you don't have to recalculate it every time. Furthermore, dbm-deep is transactional, meaning that multiple processes don't clobber each other.


        My criteria for good software:
        1. Does it work?
        2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
      Thank you MidLifeXis, but all my packages are available. I am moving them from one location to another for space efficiency. The bin packing solution seems like it is the best for my issue. Does anyone have a perl example of the bin packing solution? I have a hash with pkg A through E with the size. Can someone help?