Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
On a server, I have 500,000 HTML documents. Anywhere where my name occurs, I'd like to wrap it in a link to my homepage.
On the surface, this seems like a simple search and replace hack job. However, we're dealing with HTML and I can't just: $html =~ s/Mike Judge/\Q$URL\E/i . My name could appear in page titles or comments or even already enclosed within a link, so a simple (or even cleverly crafted) regular expression won't work.
I've spent some time perusing CPAN this evening and while HTML::TokeParser (or HTML::PullParser) looks like I'm on the right path, they don't have a way to manipulate plain text in an HTML file -- while still preserving the HTML structure.
This seems like a common problem that's been solved before. Anyone have any suggestions for search and replacing within half a million HTML files?
(As always, I appreciate the monks' kindness!)
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Search and replacing across 500,000 HTML documents
by PodMaster (Abbot) on Apr 22, 2004 at 10:25 UTC | |
|
Re: Search and replacing across 500,000 HTML documents
by matija (Priest) on Apr 22, 2004 at 10:24 UTC | |
|
Re: Search and replacing across 500,000 HTML documents
by Anonymous Monk on Apr 22, 2004 at 13:45 UTC | |
|
Re: Search and replacing across 500,000 HTML documents
by pizza_milkshake (Monk) on Apr 22, 2004 at 20:16 UTC | |
|
Re: Search and replacing across 500,000 HTML documents
by Anonymous Monk on Apr 23, 2004 at 03:10 UTC | |
|
Re: Search and replacing across 500,000 HTML documents
by Anonymous Monk on Apr 23, 2004 at 19:33 UTC |