Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
I'm trying to s/// on the plaintext in a variable containing HTML. If naively I do $html =~ s/PERL/Perl/g, lots of links and HTML elements become broken. I just want the text to be replaced.
I looked on CPAN and found HTML::FormatText, which converts the HTML to plaintext. But it won't convert back to HTML after I do the substitution. Can I use HTML::Parser, HTML::TokeParser, HTML::TokeParser::Simple, or HTML::TreeBuilder to identify the plaintext, manipulate it how I see fit, and reassemble the structure into the original HTML, with plaintext modifications? If so, how?
I looked at Sean M. Burke's article in TPJ, and it mentions the as_text() method, but I'm at wits end how to put the modified text back in the HTML document. Can someone provide an example? I just want to apply a regex to the plain text portion of $html, how can I do this? Thanks in advance, -jc
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Manipulating plaintext within HTML
by Ovid (Cardinal) on Jun 26, 2003 at 19:37 UTC | |
by Anonymous Monk on Jun 26, 2003 at 20:33 UTC |