Re: Re: Re: file chunk algorithm

Yes, it does eat up memory when run on a large number of files. You may also be hitting on an issue with Data::Dumper, which snarfs memory like it's going out of style on huge data-structures.

Try switching over to an "on-line" bin-packing algorithm as shown below (this one is a basic first-fit), but first try replacing just the print Dumper ... with the mydump routine below.

update: I've confirmed that this was a Data::Dumper issue, at least on my system. When I switched my first response's code to use the mydump routine on my 9 gig drive, with 160,000 files and a $max_bin_size of 1 MB (~10,000 total bins), memory use did not exceed 50 MB. With Data::Dumper doing the dump, well, even 700MB of RAM couldn't stop the inevitable.

use File::Find;

my %filesize;
my %bins;
my $num_bins = 0;
my $max_bin_size = 10*1000*1000;
my $top_dir = '.';
$| = 1;

find(\&packer, $top_dir);

sub packer {
  print "finding: $_\n";
  my $file = $File::Find::name;
  my $fsize = -s;
  my $bin;
  for (keys %bins) {
    if ($bins{$_}{size} + $fsize < $max_bin_size) {
      $bin = $_;
      last;
    }
  }
  $bin = $num_bins++ if not defined $bin;
  $bins{$bin}{size} += $fsize;
  push @{$bins{$bin}{files}}, $file;
  
}

sub mydump {
  my $bins = shift;
  while (my ($bnum, $bstruct) = each %$bins) {
    print "Bin Number: $bnum ($bstruct->{size} bytes)\n";
    print "  $_\n" for @{$bstruct->{files}};
  }
}
print mydump(\%bins);
[download]

   MeowChow                                   
               s aamecha.s a..a\u$&owag.print

Comment on Re: Re: Re: file chunk algorithm Select or Download Code