I tried messing around with some HTML::Parser code, as well as hstrip, but they didn't seem to get me where I need to be. I also tried HTML::TagFilter and HTML::TreeBuilder with the same level of success.. none. merlyn also has an article on something similar, but removes the tags themselves, leaving the text values. Close to what I need, but not quite there.
The glitch here is that I need the color="#RRGGBB" value in the tag, but I need to drop anything else that appears in there, leaving just the font tag and color attribute and value. The other sticky point is that many people use single-quotes around the attributes, some use none, and a simple regex would have to be quite smart to figure this out (and likely rife with errors).
Doing this with exclusively regexes is going to be prone to failure, especially since tags can be improperly nested, so I can't just yank from <font .*?> to </font> and work on the remainder.
Here's an example of what my input could look like, and what I need for final output:
<font color="#000000" face="Arial,Helvetica" size="1"> Some text </font> <font color="#000000"> Some text </font>
Can any monk lend a hand?
In reply to Stripping font "face" values from font tags by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |