I've written things like this before that do something similar with files in a local directory. In those cases, it was pretty simple to set up a stack of directories to investigate separate from the list of files I was building, and just keep processing until the stack was empty. I think your problem is similar.

It might look something like this:

while ($url_stack is not empty) { $url = pop $url_stack; open URL $url; while (<URL>) { my @words split / /, $_ # split a line into words # For each word in the line, see if it's a URL. Push # it to the stack and substitute the local path if it # is foreach my $word @words { if $word =~ m/^http:\/\//; push $word $url_stack; $word =~ s/'remote_path'/'local_path'/; } # join all the words together into a new line join @words my $output_line; # write that line into the local version of the file. print <LOCAL_VERSION> $output_line; } }

As this is intended to be a psuedo-code snippet, I'm obviously leaving a lot out, like opening the output file, &c, but I think the basic premise is sound.

That said, I'm sure there is an easier way to do it. w3mir, for example. You also might look into wget options to make sure you're not missing something in there. Good luck!


In reply to Re: Need direction on mass find/replacement in HTML files. by starX
in thread Need direction on mass find/replacement in HTML files. by kevin4truth

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.