patgas has asked for the wisdom of the Perl Monks concerning the following question:

So besides reading PM and playing Perl golf all day at work, I occasionally get to use Perl to actually do some work. Like today, for example... A coworker wanted a bunch of HTML files in a directory renamed to be ASP files. Simple enough, I just:

#!/usr/bin/perl -w use strict; my $dir = shift; opendir DIR, $dir or die "Can't open $dir: $!"; foreach my $file ( grep /\.html|\.htm$/i, readdir DIR ) { ( my $newfile = $file ) =~ s/\.html|\.htm$/\.asp/i; rename "$dir\\$file", "$dir\\$newfile"; print "Changing $dir\\$file to $newfile\n"; } closedir DIR;

And later, when she asked if it were possible to search and replace text in a whole bunch of files, I proudly smiled and said, "Absolutely." So then I:

#!/usr/bin/perl -w use strict; my $dir = shift; opendir DIR, $dir or die "Can't open $dir: $!"; foreach my $file ( grep /\.asp$/i, readdir DIR ) { my $target = "<html>\n" . "<head>\n" . "<title>[^>]*<\/title>\n" . "<meta http-equiv=\"Content-Type\" content=\"text\/ht +ml; charset=iso-8859-1\">\n" . "<\/head>\n\n" . "<body bgcolor=[\"]#FFFFFF[\"]>"; my $target2 = "<HTML>\n" . "<HEAD>\n" . "<TITLE>[^>]*<\/TITLE>\n" . "<META HTTP-EQUIV=\"Content-Type\" CONTENT=\"text\/h +tml; charset=iso-8859-1\">\n" . "<\/HEAD>\n" . "<BODY BGCOLOR=#FFFFFF>"; my $replace = "<!-- #include file=\"popupHeader\.asp\" -->"; print "Changing $dir\\$file..."; open FILE, "$dir\\$file" or die "Can't open $dir\\$file: $!"; my $text; { local $/ = undef; $text = <FILE>; } $text =~ s/$target/$replace/; $text =~ s/$target2/$replace/; close FILE; open FILE, ">$dir\\$file"; print FILE $text; close FILE; print "Done.\n"; } closedir DIR;

Now I was pretty pleased with myself that I've become acquainted enough with Perl to be able to write such awe-inspiring (at least to my coworker) scripts in about 20 minutes. What I'd like to know now is, how can I improve them? What did I do wrong? Is there a way to simplify them, so next time it comes up, it'll only take 5 minutes to write them?

"We're experiencing some Godzilla-related turbulence..."

Replies are listed 'Best First'.
My pattern for "in-place edit" for many files
by merlyn (Sage) on Sep 28, 2001 at 20:30 UTC
    My pattern for "search and replace many files" is to first grab the filenames, usually with File::Find, and then use "inplace editing mode" to do the reading and writing while I concentrate on the editing part.
    use File::Find; { my @starters = @ARGV ? @ARGV : qw(/home/merlyn); @ARGV = (); find sub { push @ARGV, $File::Find::name if /\.asp$/; # or whatever condition + here }, @starters; } { local $/; # slurp it local $^I = ".bak"; while (<>) { # $_ now contains the entire file, alter at will s/foo/bar/g; print; } }

    -- Randal L. Schwartz, Perl hacker

      This question is a little bit offtopic maybe, but this is puzzling me since I have seen it in a couple of other nodes too. Why are you using two different code blocks in this example? Has it something to do with the lexical scope of the variables?

      Thank You very much! Regards,
      C-Keen

        I've found that limiting the scope of my temporaries has reduced my debugging time. So when I need a "scratch variable" for a few steps, or need to set something dangerously global like $/ for a few steps, I pop open a block and do the work there.

        Doing so makes it very clear to the reader that this variable doesn't need a long lifetime, and even ensures that the action won't collide with a larger-scoped variable. It's a slick trick, and works nicely.

        I've also recently started using the idiom of a one-iteration foreach loop to "name" a value rather than setting up a temporary, like:

        for ($some->complicated(@value)->{$to_fetch}) { print; # print it print LOGFILE $_; # and to the logfile substr($_, 0, 1, ""); # remove its first character }
        It's a nice idiom, and is also the explanation of why Larry intentially allows foreach to be spelled "f-o-r". And the "backwards foreach" makes it even easier:
        do_this($_), do_that($_, @moreargs) for $some{$complicated}{$value};

        -- Randal L. Schwartz, Perl hacker

      I like this, but in the version I just added to my toolbox I made one small change: die if there's nothing to replace in @ARGV. Otherwise this script just sits there waiting for STDIN. Being not all that wise about the diamond operator it took me a while to figure out what was going on.
      my @starters = @ARGV ? @ARGV : ('./beginning_includes'); #current di +rectory @ARGV = (); find sub { push @ARGV, $File::Find::name if $File::Find::name =~ /.txt$/i; # +or whatever condition here }, @starters; die "nothing to replace" unless @ARGV; }
      Another possibility would be to use File::Searcher.

      UPDATE: Doesn't work too well on windows though. After mucking around a bit I decided to keep using simpler search and replace with $^I.

Use Perl's in-place file-handling capabilities
by petdance (Parson) on Sep 28, 2001 at 20:32 UTC
    For one, you don't need both of those patterns at the top, if you just replace
    $text =~ s/$target/$replace/; $text =~ s/$target2/$replace/;
    with
    $text =~ s/$target/$replace/i; # Case-insensitive

    Second, I'd look at letting Perl's -i and -p flags do all your editing in place for you. It takes care of all the globbing, the reading the file, etc etc etc.

    For instance, your code could simply be (rough and untested):

    #!perl -i~ -p BEGIN { $/ = undef; $target = blah blah blah; $replace = blah blah blah; } s/$target/$replace/ig;
    And you'd just call it as:
    myprog.pl *.asp

    Perl has some very powerful, very common idioms to do this most common of text transformations. Take advantage of them when you can.

    xoxo,
    Andy
    --
    <megaphone> Throw down the gun and tiara and come out of the float! </megaphone>

      I originally tried using the case-insensitive substitution, but it didn't seem to catch all the files. The only other difference I could find in the files are the " "s surrounding the body background color, but I wasn't 100% sure if my regex would match if it had the quotes or not. This was a one-shot deal, and I needed it to work as quickly as possible, even at the expense of researching the perfect regex.

      "We're experiencing some Godzilla-related turbulence..."

Re: One-shot code critique
by runrig (Abbot) on Sep 28, 2001 at 20:39 UTC
    A couple of things, first is just a matter of preference, but on this:
    open FILE, "$dir\\$file" or die "Can't open $dir\\$file: $!";
    I find the double backslashes ugly, and on Windows you can pretend you have a Unix file system, and just use a single forward slash, which looks alot better IMHO.

    Second, correct me if I'm wrong, but I don't see any variables in your $target, $target2, or $replace variables, which makes them constant strings, and as such I would define them outside the loop, and on your substitution statement I would use the '/o' modifier (see perldoc perlre).

    Last, you're checking the status of your opendir and your first open, which is good, but your not checking the status of your lastopen, which is bad.

    Update: Also, use 'here docs', e.g.:

    my $target = <<ENDHTML; <html> <head> <title>[^>]*</title> <meta http-equiv="Content-Type" content="text/html;charset=iso-8859-1"> </head> <body bgcolor=["]#FFFFFF["]> ENDHTML
Re: One-shot code critique
by FoxtrotUniform (Prior) on Sep 28, 2001 at 21:04 UTC
    foreach my $file ( grep /\.asp$/i, readdir DIR ) {

    Why not use glob("$dir/*.asp") here instead of readdir?

    (I expect to learn something here)

    --
    :wq

      Besides TIMTOWTDI a number of operating systems have quite low limits to the number of files that get returned by a glob so if the target dir is large some files will just get omitted. Silently. Ouch.

      cheers

      tachyon

      s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

        Eek. So noted. Can you give some examples, or a pointer to more info? Sure be nice to know if my OS/shell was broken like this....

        (See? I told you I'd learn something.)

        --
        :wq