in reply to Massive File Editing

Two suggestions:
  1. Use File::Find with File::Glob. One for digging into sub directories, the other one for matching .shtml files. Much less coding on your own.
  2. When you read files, call read with a big buffer, to read in the whole shtml file by one call. This improves performance. Don't handle line by line, too slow. Memory is not an issue in your case(, unless you have huge shtml files). When you s///, use /g modifier.
  3. Don't consider +< in this case, as your new content is shorter than the old one. (I am not saying you cannot use it, but using it in this case, requires more coding effort)
use File::Find; use File::Glob ':glob'; use strict; find(\&wanted, "c:/perl58/bin"); #replace with your directory sub wanted { if ((-d $File::Find::name) && ($_ ne ".") && ($_ ne "..")) { my @shtml_files = bsd_glob("*.shtml"); foreach my $shtml_file (@shtml_files) { print $shtml_file; open(SHTMLFILE, "<", $shtml_file); my $buffer; read(SHTMLFILE, $buffer, 10000); #give some big number, wh +ich exceeds the size of all your .shtml files close(SHTMLFILE); $buffer =~ s/<a href="main\.php\?page=(.*?)\"/<a href="mai +n.php?id=$1"/g; open(SHTMLFILE, ">", $shtml_file); close(SHTMLFILE); } } }

Replies are listed 'Best First'.
Re: Re: Massive File Editing
by chromatic (Archbishop) on Dec 15, 2002 at 06:24 UTC

    For each file or directory you find with File::Find, you're finding all of the files in its containing directory with File::Glob. You don't really need File::Glob.

      I know this. The purpose of using File::Glob here is to reduce coding effort, so you don't need to match file name patterns on your own. When File::Glob can do this for you, what is the point to reinvent it?

      I would agree File::Glob is too much a waste, if File::Find is improved to support patterns, and only return those entities, whose name match a certain pattern (ideally a regexp). But why the Perl community didn't do it? Reason is simple, because of the existance of File::Glob, there is no point to repeat/reinvent the same functionality in another class.

      If you look at this from an OO view, it does make a lot of sense. Although File::Find and File::Glob might take care of different but tightly related tasks in some programs, they still should be abstracted as two different classes.

        I'm not suggesting reinventing File::Glob. I'm suggesting that you only want to edit each file once. That's not what your code does.

        For every file or directory your code finds, it processes every .shtml file in the current directory.

        There are two ways to fix it. One, don't use File::Glob. Two, only use File::Glob if the current file from File::Find is a directory.

        if File::Find is improved to support patterns, ... But why the Perl community didn't do it? Reason is simple, because of the existance of File::Glob, there is no point to repeat/reinvent the same functionality in another class

        Nope. File::Find doesnt have any filtering mechansim explicitly built into it for a very good reason that has nothing to do with File::glob. Basically the callback mechanism used by File::Find is about a million times more powerful than any filtering technique they could have provided. If you want to do pattern matching on the names then just do it in the wanted function:

        #perl -l use File::Find; find sub{print $_ if -f and /\.txt$/i},@INC;
        will print out all text files in a directory reachable from the paths in @INC for example.

        And if this really is not sufficient (id be suprised) then take a look at File::Find::Rules

        HTH

        --- demerphq
        my friends call me, usually because I'm late....