On a server, I have 500,000 HTML documents. Anywhere where my name occurs, I'd like to wrap it in a link to my homepage.
On the surface, this seems like a simple search and replace hack job. However, we're dealing with HTML and I can't just: $html =~ s/Mike Judge/\Q$URL\E/i . My name could appear in page titles or comments or even already enclosed within a link, so a simple (or even cleverly crafted) regular expression won't work.
I've spent some time perusing CPAN this evening and while HTML::TokeParser (or HTML::PullParser) looks like I'm on the right path, they don't have a way to manipulate plain text in an HTML file -- while still preserving the HTML structure.
This seems like a common problem that's been solved before. Anyone have any suggestions for search and replacing within half a million HTML files?
(As always, I appreciate the monks' kindness!)
In reply to Search and replacing across 500,000 HTML documents by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |