Hi Kage,
I know the others are suggesting moving towards using file::find or some other function, which is a great idea if you can install the modules on the end host. I recently had a similar problem where I could not easily install modules, long story, and had to write the function from scrap. What I changed is that I feed in a ls -laR listing into the program to parse out the files I wanted to use and then modified those files.

Example:

# Try to match the expected input line format (from "ls" output) # if ("$_" =~ /^\-.+ ([0-9]+) ([A-Z|a-z]+ [ ]?[0-9]+ [ ]?[:|0-9]+) (.+) +$/) { # Set some defaults to avoid potentially problematic missing field +s # $file1 = "FULLNAME"; $file2 = "BASENAME"; $fext = "NO EXTENSION"; # Set file size, date and compelte filename variables $fsize =$1; $date = $2; $file1 = $3; if ("$file1" =~/^([\.]?.+)\.(.+))$/) { $file2 = $1; $fext = $2; }
Then for your example you would test to see if the file extension was .shtml and if it was open the file and read it, whether to read in the file as a glob or line really depends on two issues;

1) How many times do you plan to run this, let's be honest if your only going to run this once you don't need a perfectly efficient piece of code. even though I hate to admit that.

2) How many files and the size of the files you'll be reading in.

Then as you hit the line you could either do a s// or just replace the contents of the substring. I like to cheat with a sanity test since I substitute operations seem to always do bad things to me data.

if ($_ =~ /<a href="main.php?page/) { s/main.php?page=/main.php/?id=/g }
then the file open operator should be pretty straight foward (no more directory recursion woo!), if you have some problems with the output of the ls statement you may have to embedded a directory. Other than that it should be pretty straight foward. I had to write this to deal with a terrabyte file system in a lawsuit, so my solution may require more work then you are willing to deal with.

Dave -- Saving the world one node at a time


In reply to Re: Massive File Editing by Zapawork
in thread Massive File Editing by Kage

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.