I'm trying to s/// on the plaintext in a variable containing HTML. If naively I do $html =~ s/PERL/Perl/g, lots of links and HTML elements become broken. I just want the text to be replaced.
I looked on CPAN and found HTML::FormatText, which converts the HTML to plaintext. But it won't convert back to HTML after I do the substitution. Can I use HTML::Parser, HTML::TokeParser, HTML::TokeParser::Simple, or HTML::TreeBuilder to identify the plaintext, manipulate it how I see fit, and reassemble the structure into the original HTML, with plaintext modifications? If so, how?
I looked at Sean M. Burke's article in TPJ, and it mentions the as_text() method, but I'm at wits end how to put the modified text back in the HTML document. Can someone provide an example? I just want to apply a regex to the plain text portion of $html, how can I do this? Thanks in advance, -jc
In reply to Manipulating plaintext within HTML by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |