Yes, it does eat up memory when run on a large number of files. You may also be hitting on an issue with Data::Dumper, which snarfs memory like it's going out of style on huge data-structures.
Try switching over to an "on-line" bin-packing algorithm as shown below (this one is a basic first-fit), but first try replacing just the print Dumper ... with the mydump routine below.
update: I've confirmed that this was a Data::Dumper issue, at least on my system. When I switched my first response's code to use the mydump routine on my 9 gig drive, with 160,000 files and a $max_bin_size of 1 MB (~10,000 total bins), memory use did not exceed 50 MB. With Data::Dumper doing the dump, well, even 700MB of RAM couldn't stop the inevitable.
use File::Find;
my %filesize;
my %bins;
my $num_bins = 0;
my $max_bin_size = 10*1000*1000;
my $top_dir = '.';
$| = 1;
find(\&packer, $top_dir);
sub packer {
print "finding: $_\n";
my $file = $File::Find::name;
my $fsize = -s;
my $bin;
for (keys %bins) {
if ($bins{$_}{size} + $fsize < $max_bin_size) {
$bin = $_;
last;
}
}
$bin = $num_bins++ if not defined $bin;
$bins{$bin}{size} += $fsize;
push @{$bins{$bin}{files}}, $file;
}
sub mydump {
my $bins = shift;
while (my ($bnum, $bstruct) = each %$bins) {
print "Bin Number: $bnum ($bstruct->{size} bytes)\n";
print " $_\n" for @{$bstruct->{files}};
}
}
print mydump(\%bins);
MeowChow
s aamecha.s a..a\u$&owag.print |