in reply to file chunk algorithm
What you are doing is equivalent to bin-packing, which is an NP-complete problem. Your codes employs a simple "next-fit" algorithm, which is somewhat inefficient, but easy to code. Better heuristics exist, such as first fit, best fit, and first-fit/best-fit decreasing.
One good way to do this (first fit decreasing) is to construct a hash of your files and their associated sizes, sort it in decreasing order of size, and pack the biggest items first. Keep track of each bin's free space, and continue adding items to the first bin in which it can fit. Add additional bins only when you can't find a fit.
Update: Couldn't resist the lure of an algorithm implementation...
use Data::Dumper; use File::Find; my %filesize; my %bins; my $num_bins = 0; my $max_bin_size = 10*1000*1000; my $top_dir = '.'; find(sub { $filesize{$File::Find::name} = -s }, $top_dir); for my $file (sort { $filesize{$b} <=> $filesize{$a} } keys %filesize) + { my $fsize = $filesize{$file}; my $bin; for (keys %bins) { if ($bins{$_}{size} + $fsize < $max_bin_size) { $bin = $_; last; } } $bin = $num_bins++ if not defined $bin; $bins{$bin}{size} += $fsize; push @{$bins{$bin}{files}}, $file; } print Dumper(\%bins);
MeowChow s aamecha.s a..a\u$&owag.print
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: file chunk algorithm
by thealienz1 (Pilgrim) on Feb 26, 2001 at 20:32 UTC | |
by tilly (Archbishop) on Feb 27, 2001 at 03:47 UTC | |
by MeowChow (Vicar) on Feb 26, 2001 at 22:57 UTC |