What you are doing is equivalent to bin-packing, which is an NP-complete problem. Your codes employs a simple "next-fit" algorithm, which is somewhat inefficient, but easy to code. Better heuristics exist, such as first fit, best fit, and first-fit/best-fit decreasing.
One good way to do this (first fit decreasing) is to construct a hash of your files and their associated sizes, sort it in decreasing order of size, and pack the biggest items first. Keep track of each bin's free space, and continue adding items to the first bin in which it can fit. Add additional bins only when you can't find a fit.
Update: Couldn't resist the lure of an algorithm implementation...
use Data::Dumper; use File::Find; my %filesize; my %bins; my $num_bins = 0; my $max_bin_size = 10*1000*1000; my $top_dir = '.'; find(sub { $filesize{$File::Find::name} = -s }, $top_dir); for my $file (sort { $filesize{$b} <=> $filesize{$a} } keys %filesize) + { my $fsize = $filesize{$file}; my $bin; for (keys %bins) { if ($bins{$_}{size} + $fsize < $max_bin_size) { $bin = $_; last; } } $bin = $num_bins++ if not defined $bin; $bins{$bin}{size} += $fsize; push @{$bins{$bin}{files}}, $file; } print Dumper(\%bins);
MeowChow s aamecha.s a..a\u$&owag.print
In reply to Re: file chunk algorithm
by MeowChow
in thread file chunk algorithm
by thealienz1
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |