Zipping the contents of a directory by filename

justin423 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Zipping the contents of a directory by filename by choroba (Cardinal) on May 29, 2025 at 15:32 UTC
Read substr's documentation carefully. `substr $files, 7` doesn't return the first 7 letters, it returns the string from the 8th character to the end. `my $s = '123456789'; say substr $s, 7; # 89 say substr $s, 0, 7; # 1234567` [download] `map{substr$_->[0],$_->[1]\|\|0,1}[\\|\|{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^ARGV,3]`	[reply] [d/l] [select]
Re^2: Zipping the contents of a directory by filename by justin423 (Scribe) on May 29, 2025 at 15:42 UTC
that was it... I knew it was something small...	[reply]
Re^3: Zipping the contents of a directory by filename by hsmyers (Canon) on Jun 02, 2025 at 20:30 UTC
Caltrops are small-ish, but then who looks down?! --hsm "Never try to teach a pig to sing...it wastes your time and it annoys the pig."	[reply]
Re: Zipping the contents of a directory by filename by Fletch (Bishop) on May 29, 2025 at 15:48 UTC
If you're wanting to do something with files and extensions File::Basename is probably a safer, platform-independent mechanism than blithely using substr. See also Path::Tiny which provides similar {base,dir}name methods. The cake is a lie. The cake is a lie. The cake is a lie.	[reply]
Re: Zipping the contents of a directory by filename by karlgoethebier (Abbot) on Jun 01, 2025 at 06:14 UTC
”…a few hundred PDF's…” One more thing occurred to me: PDFs can hardly be compressed at all. I've just tried it out, a 27k PDF becomes a 25k PDF, for example - not really interesting. And since there are so many of them, performance probably plays a bit of a role. In my opinion, it would be more natural to use `tar` or rather Archive::Tar. I haven't measured it, but it probably performs much better. Just as an aside and as a reminder: There are these age-old comparisons of the performance of `cp`, `rsync` and `tar`. As far as I remember, `tar` has always performed better than `zip` for operations on many files. I know - there is no compression involved, but just as a general statement on the good performance of `tar`. Minor update: striked out irrelevant content. «The Crux of the Biscuit is the Apostrophe»	[reply] [d/l] [select]
Re^2: Zipping the contents of a directory by filename by jeffenstein (Hermit) on Jun 02, 2025 at 13:09 UTC
This depends greatly on the content of the pdf. I just gzipped a few random pdfs and got between 4.2% and 51.6% reduction. > ls -lh ?.pdf -rwxrwx--- 1 root vboxsf 4.0M Jun 2 15:01 1.pdf* -rwxrwx--- 1 root vboxsf 1.9M Jun 2 15:01 2.pdf* -rwxrwx--- 1 root vboxsf 1.2M Jun 2 15:01 3.pdf* -rwxrwx--- 1 root vboxsf 340K Jun 2 15:01 4.pdf* -rwxrwx--- 1 root vboxsf 69K Jun 2 15:01 5.pdf* -rwxrwx--- 1 root vboxsf 416K Jun 2 15:01 6.pdf* > gzip -v ?.pdf 1.pdf: 19.5% -- replaced with 1.pdf.gz 2.pdf: 23.2% -- replaced with 2.pdf.gz 3.pdf: 51.6% -- replaced with 3.pdf.gz 4.pdf: 4.2% -- replaced with 4.pdf.gz 5.pdf: 8.3% -- replaced with 5.pdf.gz 6.pdf: 6.2% -- replaced with 6.pdf.gz > ls -lh ?.pdf.gz -rwxrwx--- 1 root vboxsf 3.3M Jun 2 15:01 1.pdf.gz* -rwxrwx--- 1 root vboxsf 1.4M Jun 2 15:01 2.pdf.gz* -rwxrwx--- 1 root vboxsf 557K Jun 2 15:01 3.pdf.gz* -rwxrwx--- 1 root vboxsf 326K Jun 2 15:01 4.pdf.gz* -rwxrwx--- 1 root vboxsf 63K Jun 2 15:01 5.pdf.gz* -rwxrwx--- 1 root vboxsf 390K Jun 2 15:01 6.pdf.gz* [download]	[reply] [d/l]
Re^3: Zipping the contents of a directory by filename by karlgoethebier (Abbot) on Jun 02, 2025 at 15:29 UTC
Interesting. Here is a benchmark, to complete the picture: `hyperfine --runs 10000 'gzip -k -c a.pdf -c b.pdf c.pdf > out.gz' 'tar + cf out.tar a.pdf b.pdf c.pdf' # truncated output Summary 'tar cf out.tar a.pdf b.pdf c.pdf' ran 1.72 ± 0.57 times faster than 'gzip -k -c a.pdf -c b.pdf c.pdf > o +ut.gz'` [download] Possibly too few files but `tar` is significantly faster. «The Crux of the Biscuit is the Apostrophe»	[reply] [d/l] [select]