justin423 has asked for the wisdom of the Perl Monks concerning the following question:

What am I missing? I am trying to zip a few hundred PDF's by the first 7 letters of the filename to make each zip a manageable size.

It is zipping all of them into just one file and I know it must be something simple that I am missing.

#!/usr/bin/perl use IO::Compress::Zip qw(:all); $path='/DATA/DOCUMENTS/'; opendir my $dh, $path; my @files = readdir $dh; foreach my $files (@files){ print "$files\n"; $zipfilename=substr($files,7); $zipfilename1=$path.$zipfilename; zip [ glob("$zipfilename1*.pdf") ] => "$zipfilename1.zip" or die "Cannot create zip file: $ZipError" ; } closedir $dh;

Replies are listed 'Best First'.
Re: Zipping the contents of a directory by filename
by choroba (Cardinal) on May 29, 2025 at 15:32 UTC
    Read substr's documentation carefully. substr $files, 7 doesn't return the first 7 letters, it returns the string from the 8th character to the end.

    my $s = '123456789'; say substr $s, 7; # 89 say substr $s, 0, 7; # 1234567
    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
      that was it... I knew it was something small...

        Caltrops are small-ish, but then who looks down?!

        --hsm

        "Never try to teach a pig to sing...it wastes your time and it annoys the pig."
Re: Zipping the contents of a directory by filename
by Fletch (Bishop) on May 29, 2025 at 15:48 UTC

    If you're wanting to do something with files and extensions File::Basename is probably a safer, platform-independent mechanism than blithely using substr. See also Path::Tiny which provides similar {base,dir}name methods.

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

Re: Zipping the contents of a directory by filename
by karlgoethebier (Abbot) on Jun 01, 2025 at 06:14 UTC

    ”…a few hundred PDF's…”

    One more thing occurred to me: PDFs can hardly be compressed at all. I've just tried it out, a 27k PDF becomes a 25k PDF, for example - not really interesting. And since there are so many of them, performance probably plays a bit of a role. In my opinion, it would be more natural to use tar or rather Archive::Tar. I haven't measured it, but it probably performs much better. Just as an aside and as a reminder: There are these age-old comparisons of the performance of cp, rsync and tar. As far as I remember, tar has always performed better than zip for operations on many files. I know - there is no compression involved, but just as a general statement on the good performance of tar.

    Minor update: striked out irrelevant content.

      This depends greatly on the content of the pdf. I just gzipped a few random pdfs and got between 4.2% and 51.6% reduction.

      > ls -lh ?.pdf -rwxrwx--- 1 root vboxsf 4.0M Jun 2 15:01 1.pdf* -rwxrwx--- 1 root vboxsf 1.9M Jun 2 15:01 2.pdf* -rwxrwx--- 1 root vboxsf 1.2M Jun 2 15:01 3.pdf* -rwxrwx--- 1 root vboxsf 340K Jun 2 15:01 4.pdf* -rwxrwx--- 1 root vboxsf 69K Jun 2 15:01 5.pdf* -rwxrwx--- 1 root vboxsf 416K Jun 2 15:01 6.pdf* > gzip -v ?.pdf 1.pdf: 19.5% -- replaced with 1.pdf.gz 2.pdf: 23.2% -- replaced with 2.pdf.gz 3.pdf: 51.6% -- replaced with 3.pdf.gz 4.pdf: 4.2% -- replaced with 4.pdf.gz 5.pdf: 8.3% -- replaced with 5.pdf.gz 6.pdf: 6.2% -- replaced with 6.pdf.gz > ls -lh ?.pdf.gz -rwxrwx--- 1 root vboxsf 3.3M Jun 2 15:01 1.pdf.gz* -rwxrwx--- 1 root vboxsf 1.4M Jun 2 15:01 2.pdf.gz* -rwxrwx--- 1 root vboxsf 557K Jun 2 15:01 3.pdf.gz* -rwxrwx--- 1 root vboxsf 326K Jun 2 15:01 4.pdf.gz* -rwxrwx--- 1 root vboxsf 63K Jun 2 15:01 5.pdf.gz* -rwxrwx--- 1 root vboxsf 390K Jun 2 15:01 6.pdf.gz*

        Interesting. Here is a benchmark, to complete the picture:

        hyperfine --runs 10000 'gzip -k -c a.pdf -c b.pdf c.pdf > out.gz' 'tar + cf out.tar a.pdf b.pdf c.pdf' # truncated output Summary 'tar cf out.tar a.pdf b.pdf c.pdf' ran 1.72 ± 0.57 times faster than 'gzip -k -c a.pdf -c b.pdf c.pdf > o +ut.gz'

        Possibly too few files but tar is significantly faster.