comment on

Without going into the golfish ways of doing this, but sticking to script form, there's a whole lot of things you can improve. my %skip = ( 'gif' => 1, 'jpg' => 1, 'jpeg' => 1, 'png' => 1 );
[download]
I would prefer to write this like so:

my %skip_for;
@skip_for{qw( gif jpg jpeg png )} = ();
[download]

and later test using exists $skip_for{$ext}

Next note: you can just use $File::Find::name rather than "$File::Find::dir/$_"

Then we have a case of redundant syntax: in \&{ sub { ... } } the sub{ ... } already gives you a reference. Then your &{} goes and dereferences it, only to feed it back to the \ which makes a reference from the result again. You can drop the surrounding \&{} and simply write sub { ... } here.

I am a bit puzzled by this:

my ($nil,$ext) = $file =~ /^(.*?)\.(.*?)$/gs;

If you throw away the first capture, why capture at all?

my ($ext) = $file =~ /^.*?\.(.*?)$/gs;

which is better written as

my ($ext) = $file =~ /[.]([^.]+)$/gs;

(In words: I want as many non-dot characters as there are in front of the end of the string, update: but only if there's a dot in the filename.)

The $ext = '' unless defined $ext; can be avoided if you put the $skip{$ext} inside an if(/match here/)

Lastly, since you're not interested in the individual lines of your input, but separating the input costs effort, it would be better to unconditionally slurp large chunks of X bytes instead.

The next point is a maneuvre critique. Why would one first fetch a list of directories and then go and read each directory manually, when the same first search already gives you all the file names on a silver plate? (And why are counting something, when you never use that count? :-))

And lastly, rather than hardcode the directory in the script, it's preferrable to take them as parameters from the commandline.

So here's an updated version:

#!/usr/bin/perl -w
use strict;
use Fcntl;
use File::Find;

my %skip_for;
@skip_for{qw( gif jpg jpeg png )} = ();

find(
    sub {
        next if -d or /^[.]/;
        next if /[.]([^.]+)$/ and exists $skip_for{$1};

        my $content = "";

        # gobble and mangle 64k chunks at a time
        sysopen FH, $_, O_RDWR;
        s/\r//g, $content .= $_ while sysread FH, $_, 65536;

        # go back to top of file
        sysseek FH, 0, 0;
        syswrite FH, $content, length $content;

        # the file still has its original length,
        # because we didn't clobber it with an open FH, ">file"
        # so we need to fix that
        truncate FH, tell FH;

        close FH;
    },
    (@ARGV) || "." # NB: parens required
);
[download]

Further improvement might be to use some Getopt:: module to allow the user to change the $skip_for rules.

Update: I must have been asleep as well. Kudos to Zaxo for pointing out my regex would return the whole filename for extensionless files. Also, I need to go flaggelate myself for a while:

sysopen FH, $_, O_RDWR
    or (warn "Couldn't open $File::Find::name: $!\n", return);
s/\r//g, $content .= $_ while (
    defined (sysread FH, $_, 65536)
    or (warn "Couldn't open $File::Find::name: $!\n", return)
);
[download]

and, of course,

return if -d or /^[.]/;
return if /[.]([^.]+)$/ and exists $skip_for{$1};
[download]

since this is a sub, not a for loop. I feel stupid now. Oh well, guess we can feel stupid together. :-)

Makeshifts last the longest.

In reply to Re: Recursive File Substitution by Aristotle
in thread Recursive File Substitution by mt2k

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.