in reply to Zipping the contents of a directory by filename

”…a few hundred PDF's…”

One more thing occurred to me: PDFs can hardly be compressed at all. I've just tried it out, a 27k PDF becomes a 25k PDF, for example - not really interesting. And since there are so many of them, performance probably plays a bit of a role. In my opinion, it would be more natural to use tar or rather Archive::Tar. I haven't measured it, but it probably performs much better. Just as an aside and as a reminder: There are these age-old comparisons of the performance of cp, rsync and tar. As far as I remember, tar has always performed better than zip for operations on many files. I know - there is no compression involved, but just as a general statement on the good performance of tar.

Minor update: striked out irrelevant content.

Replies are listed 'Best First'.
Re^2: Zipping the contents of a directory by filename
by jeffenstein (Hermit) on Jun 02, 2025 at 13:09 UTC

    This depends greatly on the content of the pdf. I just gzipped a few random pdfs and got between 4.2% and 51.6% reduction.

    > ls -lh ?.pdf -rwxrwx--- 1 root vboxsf 4.0M Jun 2 15:01 1.pdf* -rwxrwx--- 1 root vboxsf 1.9M Jun 2 15:01 2.pdf* -rwxrwx--- 1 root vboxsf 1.2M Jun 2 15:01 3.pdf* -rwxrwx--- 1 root vboxsf 340K Jun 2 15:01 4.pdf* -rwxrwx--- 1 root vboxsf 69K Jun 2 15:01 5.pdf* -rwxrwx--- 1 root vboxsf 416K Jun 2 15:01 6.pdf* > gzip -v ?.pdf 1.pdf: 19.5% -- replaced with 1.pdf.gz 2.pdf: 23.2% -- replaced with 2.pdf.gz 3.pdf: 51.6% -- replaced with 3.pdf.gz 4.pdf: 4.2% -- replaced with 4.pdf.gz 5.pdf: 8.3% -- replaced with 5.pdf.gz 6.pdf: 6.2% -- replaced with 6.pdf.gz > ls -lh ?.pdf.gz -rwxrwx--- 1 root vboxsf 3.3M Jun 2 15:01 1.pdf.gz* -rwxrwx--- 1 root vboxsf 1.4M Jun 2 15:01 2.pdf.gz* -rwxrwx--- 1 root vboxsf 557K Jun 2 15:01 3.pdf.gz* -rwxrwx--- 1 root vboxsf 326K Jun 2 15:01 4.pdf.gz* -rwxrwx--- 1 root vboxsf 63K Jun 2 15:01 5.pdf.gz* -rwxrwx--- 1 root vboxsf 390K Jun 2 15:01 6.pdf.gz*

      Interesting. Here is a benchmark, to complete the picture:

      hyperfine --runs 10000 'gzip -k -c a.pdf -c b.pdf c.pdf > out.gz' 'tar + cf out.tar a.pdf b.pdf c.pdf' # truncated output Summary 'tar cf out.tar a.pdf b.pdf c.pdf' ran 1.72 ± 0.57 times faster than 'gzip -k -c a.pdf -c b.pdf c.pdf > o +ut.gz'

      Possibly too few files but tar is significantly faster.