Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks!
I need to zip a bunch of files in directories, but I have to make sure that the zip file(s) will not exceed a certain size. I am trying to group these files into, lets say about .5GB groups, but anyway I am having a lot a problems here, my code is going into a infinity loop on line 42. Bottom line my logic is not working. It would be nice if someone could take a look at this code to see if there is a better way of doing what I am trying to do here. Now, the code:
#!/usr/bin/perl -w use strict; use File::Find; use File::Basename; use File::stat; use File::Slurp qw(read_dir); my @directories = qw( /all_files_a /all_files_b); my @files; my $min_size = 1024; # bytes => 1K my $max_size = 10485760; # bytes =>10MB for my $d (@directories) { push @files, grep { -f && -s _ >= $min_size && -s _ <= $max_size & +& /\.pdf/} read_dir( $d, prefix => 1 ); } my $limit = 5_000_000; print "Getting file sizes...\n"; $_ = -s $_ foreach (@files); #Create zip with $limit of files per zip print "Creating zips...\n"; my $zip_number = 1; my $total_size = 0; my @other_group; while(@files or @other_group) { #Find biggest file that will fit my $next_index = 0; $next_index++ while( ($next_index <= $#files) && ($total_size + + $files[$next_index] > $limit) ); #If there was a file that will fit, add it to @other_group + if($next_index <= $#files) { push @other_group, splice(@files, $next_index, 0); $total_size += $other_group[0]; } #Otherwise, zip this group, and start a new group else { # print this for testing, these files will be zipped in the $z +ip_number.zip file foreach my $testfiles (@other_group) { print "\n$testfiles - $zip_number.zip\n"; } #Clear this group info, and increment zip number @other_group = (); $total_size = 0; $zip_number++; } }
Thanks for the help!!!

Replies are listed 'Best First'.
Re: Grouping files before zipping!
by thundergnat (Deacon) on Sep 08, 2011 at 19:52 UTC

    There's a few problems. The reason it is going into an infinite loop: there is nothing incrementing the $next_index variable. You increment while finding the first file small enough, but then never do again.

    Another issue is that by the time you try to use the @files array, it is an array of file sizes, not file names. Fine to find a group of file sizes under a limit, but no way to actually access the file names to add them to the zip file.

    Thirdly, you are pushing elements onto the end of an array, then using the first array element in calculations. You probably should use the element just pushed instead.

    Update: Oh yeah, the splice length needs to be non zero also, or it won't modify the array.

    Putzing around a bit... (note: I do not particularly recommend this, just fixed to be a working example. No guarantees about correct logic.)

    #!/usr/bin/perl -w use strict; use File::Find; use File::Basename; use File::stat; use File::Slurp qw(read_dir); my @directories = qw( /all_files_a /all_files_b); my @files; my $min_size = 1024; # bytes => 1K my $max_size = 10485760; # bytes =>10MB for my $d (@directories) { push @files, { name => "d/$_", size => -s $_ } for grep { -f && - +s _ >= $min_size && -s _ <= $max_size && /\.pdf/} read_dir( $d, prefi +x => 1 ); } my $limit = 5_000_000; #Create zip with $limit of files per zip print "Creating zips...\n"; my $zip_number = 1; my $total_size = 0; my @other_group; my $next_index = 0; while(@files) { #Find biggest file that will fit $next_index++ while( ($next_index <= $#files) && ($total_size + $f +iles[$next_index]->{size} > $limit) ); #If there was a file that will fit, add it to @other_group if($next_index <= $#files) { push @other_group, splice(@files, $next_index, 1); $total_size += $other_group[-1]->{size}; $next_index++; } #Otherwise, zip this group, and start a new group else { # print this for testing, these files will be zipped in the $z +ip_number.zip file foreach my $testfiles (@other_group) { print $testfiles->{name}," - $zip_number.zip\n"; } #Clear this group info, and increment zip number @other_group = (); $total_size = 0; $zip_number++; $next_index = 0; } }
      After some research on another way of doing this I found that it could be done using Archive::Zip. I have a sample code that does check the size of a zip before zipping all the files in a directory, it only needs to be implemented to build multiple zip files if the files size exceed the permitted volume. Anyone?
      use strict; use warnings; use Archive::Zip qw/AZ_OK/; use File::Temp qw/tempfile/; use constant MB => 1024 * 1024; my $dir = '/allfiletozip/'; my @files = do { opendir my $fd, "$dir" or die $! or die $!; grep -f, map "$dir$_", readdir $fd; }; my $zip = Archive::Zip->new; my $total; my $limit = 50*MB; foreach my $file (@files) { my $temp = Archive::Zip->new; my $member = $temp->addFile($file); next unless $member->compressedSize; my $fh = tempfile(); $temp->writeToFileHandle($fh) == AZ_OK or die $!; $zip->addMember($member); $total += $member->compressedSize; die "$total bytes exceeds archive size limit" if $total > $limit; } print "Total archive size: $total bytes\n\n"; $zip->writeToFileNamed('zipped.zip') == AZ_OK or die $!;

      Thanks!
Re: Grouping files before zipping!
by ambrus (Abbot) on Sep 08, 2011 at 19:19 UTC
Re: Grouping files before zipping!
by afoken (Chancellor) on Sep 09, 2011 at 05:43 UTC
    I have to make sure that the zip file(s) will not exceed a certain size

    If you don't need independend zip files, but just a set of reasonably small parts, use InfoZIP and its --split-size option. Other zip implementations have similar options, because splitted archives are part of the ZIP specification. Other archivers, like RAR and ARJ, have similar functions.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: Grouping files before zipping!
by Voronich (Hermit) on Sep 08, 2011 at 19:04 UTC
    without counting I'm going to guess:

    while(@files or @other_group) {
    is line 42. What do you think that does?
    Me
Re: Grouping files before zipping!
by ambrus (Abbot) on Sep 08, 2011 at 19:08 UTC
    splice(@files, $next_index, 0)

    Wouldn't that always return an empty list and not modify the array at all?