in reply to Re: Re: file chunk algorithm
in thread file chunk algorithm
Try switching over to an "on-line" bin-packing algorithm as shown below (this one is a basic first-fit), but first try replacing just the print Dumper ... with the mydump routine below.
update: I've confirmed that this was a Data::Dumper issue, at least on my system. When I switched my first response's code to use the mydump routine on my 9 gig drive, with 160,000 files and a $max_bin_size of 1 MB (~10,000 total bins), memory use did not exceed 50 MB. With Data::Dumper doing the dump, well, even 700MB of RAM couldn't stop the inevitable.
use File::Find; my %filesize; my %bins; my $num_bins = 0; my $max_bin_size = 10*1000*1000; my $top_dir = '.'; $| = 1; find(\&packer, $top_dir); sub packer { print "finding: $_\n"; my $file = $File::Find::name; my $fsize = -s; my $bin; for (keys %bins) { if ($bins{$_}{size} + $fsize < $max_bin_size) { $bin = $_; last; } } $bin = $num_bins++ if not defined $bin; $bins{$bin}{size} += $fsize; push @{$bins{$bin}{files}}, $file; } sub mydump { my $bins = shift; while (my ($bnum, $bstruct) = each %$bins) { print "Bin Number: $bnum ($bstruct->{size} bytes)\n"; print " $_\n" for @{$bstruct->{files}}; } } print mydump(\%bins);
MeowChow s aamecha.s a..a\u$&owag.print
|
|---|