my %skip_for;
@skip_for{qw( gif jpg jpeg png )} = ();
and later test using exists $skip_for{$ext}
Next note: you can just use $File::Find::name rather than "$File::Find::dir/$_"
Then we have a case of redundant syntax: in \&{ sub { ... } } the sub{ ... } already gives you a reference. Then your &{} goes and dereferences it, only to feed it back to the \ which makes a reference from the result again. You can drop the surrounding \&{} and simply write sub { ... } here.
I am a bit puzzled by this:
my ($nil,$ext) = $file =~ /^(.*?)\.(.*?)$/gs;
If you throw away the first capture, why capture at all?
my ($ext) = $file =~ /^.*?\.(.*?)$/gs;
which is better written as
my ($ext) = $file =~ /[.]([^.]+)$/gs;
(In words: I want as many non-dot characters as there are in front of the end of the string, update: but only if there's a dot in the filename.)
The $ext = '' unless defined $ext; can be avoided if you put the $skip{$ext} inside an if(/match here/)
Lastly, since you're not interested in the individual lines of your input, but separating the input costs effort, it would be better to unconditionally slurp large chunks of X bytes instead.
The next point is a maneuvre critique. Why would one first fetch a list of directories and then go and read each directory manually, when the same first search already gives you all the file names on a silver plate? (And why are counting something, when you never use that count? :-))
And lastly, rather than hardcode the directory in the script, it's preferrable to take them as parameters from the commandline.
So here's an updated version:
#!/usr/bin/perl -w
use strict;
use Fcntl;
use File::Find;
my %skip_for;
@skip_for{qw( gif jpg jpeg png )} = ();
find(
sub {
next if -d or /^[.]/;
next if /[.]([^.]+)$/ and exists $skip_for{$1};
my $content = "";
# gobble and mangle 64k chunks at a time
sysopen FH, $_, O_RDWR;
s/\r//g, $content .= $_ while sysread FH, $_, 65536;
# go back to top of file
sysseek FH, 0, 0;
syswrite FH, $content, length $content;
# the file still has its original length,
# because we didn't clobber it with an open FH, ">file"
# so we need to fix that
truncate FH, tell FH;
close FH;
},
(@ARGV) || "." # NB: parens required
);
Further improvement might be to use some Getopt:: module to allow the user to change the $skip_for rules.
Update: I must have been asleep as well. Kudos to Zaxo for pointing out my regex would return the whole filename for extensionless files. Also, I need to go flaggelate myself for a while:
sysopen FH, $_, O_RDWR
or (warn "Couldn't open $File::Find::name: $!\n", return);
s/\r//g, $content .= $_ while (
defined (sysread FH, $_, 65536)
or (warn "Couldn't open $File::Find::name: $!\n", return)
);
and, of course,
return if -d or /^[.]/;
return if /[.]([^.]+)$/ and exists $skip_for{$1};
since this is a sub, not a for loop. I feel stupid now. Oh well, guess we can feel stupid together. :-)
Makeshifts last the longest. |