Re: Massive File Editing

Two suggestions:

Use File::Find with File::Glob. One for digging into sub directories, the other one for matching .shtml files. Much less coding on your own.
When you read files, call read with a big buffer, to read in the whole shtml file by one call. This improves performance. Don't handle line by line, too slow. Memory is not an issue in your case(, unless you have huge shtml files). When you s///, use /g modifier.
Don't consider +< in this case, as your new content is shorter than the old one. (I am not saying you cannot use it, but using it in this case, requires more coding effort)

use File::Find;
use File::Glob ':glob';

use strict;

find(\&wanted, "c:/perl58/bin"); #replace with your directory

sub wanted {
    if ((-d $File::Find::name) && ($_ ne ".") && ($_ ne "..")) {
        my @shtml_files = bsd_glob("*.shtml");
        foreach my $shtml_file (@shtml_files) {
            print $shtml_file;
            open(SHTMLFILE, "<", $shtml_file);
            my $buffer;
            read(SHTMLFILE, $buffer, 10000); #give some big number, wh
+ich exceeds the size of all your .shtml files   
            close(SHTMLFILE); 
            $buffer =~ s/<a href="main\.php\?page=(.*?)\"/<a href="mai
+n.php?id=$1"/g;
            open(SHTMLFILE, ">", $shtml_file); 
            close(SHTMLFILE); 
        }
    }       
}
[download]

Comment on Re: Massive File Editing Download Code

Replies are listed 'Best First'.
Re: Re: Massive File Editing by chromatic (Archbishop) on Dec 15, 2002 at 06:24 UTC
For each file or directory you find with File::Find, you're finding all of the files in its containing directory with File::Glob. You don't really need File::Glob.	[reply]
Re: Re: Re: Massive File Editing by pg (Canon) on Dec 15, 2002 at 15:54 UTC
I know this. The purpose of using File::Glob here is to reduce coding effort, so you don't need to match file name patterns on your own. When File::Glob can do this for you, what is the point to reinvent it? I would agree File::Glob is too much a waste, if File::Find is improved to support patterns, and only return those entities, whose name match a certain pattern (ideally a regexp). But why the Perl community didn't do it? Reason is simple, because of the existance of File::Glob, there is no point to repeat/reinvent the same functionality in another class. If you look at this from an OO view, it does make a lot of sense. Although File::Find and File::Glob might take care of different but tightly related tasks in some programs, they still should be abstracted as two different classes.	[reply]
Re: Re: Re: Re: Massive File Editing by chromatic (Archbishop) on Dec 15, 2002 at 22:06 UTC
I'm not suggesting reinventing File::Glob. I'm suggesting that you only want to edit each file once. That's not what your code does. For every file or directory your code finds, it processes every `.shtml` file in the current directory. There are two ways to fix it. One, don't use File::Glob. Two, only use File::Glob if the current file from File::Find is a directory.	[reply] [d/l]
Re: Re: Re: Re: Massive File Editing by demerphq (Chancellor) on Dec 15, 2002 at 22:03 UTC
if File::Find is improved to support patterns, ... But why the Perl community didn't do it? Reason is simple, because of the existance of File::Glob, there is no point to repeat/reinvent the same functionality in another class Nope. File::Find doesnt have any filtering mechansim explicitly built into it for a very good reason that has nothing to do with File::glob. Basically the callback mechanism used by File::Find is about a million times more powerful than any filtering technique they could have provided. If you want to do pattern matching on the names then just do it in the wanted function: `#perl -l use File::Find; find sub{print $_ if -f and /\.txt$/i},@INC;` [download] will print out all text files in a directory reachable from the paths in @INC for example. And if this really is not sufficient (id be suprised) then take a look at File::Find::Rules HTH --- demerphq my friends call me, usually because I'm late....	[reply] [d/l]