The short and skinny: I need to strip the face="foo" value from <font face="foo,bar,blort">My Foo</font> tags found in a document I have slurped into a scalar.

I tried messing around with some HTML::Parser code, as well as hstrip, but they didn't seem to get me where I need to be. I also tried HTML::TagFilter and HTML::TreeBuilder with the same level of success.. none. merlyn also has an article on something similar, but removes the tags themselves, leaving the text values. Close to what I need, but not quite there.

The glitch here is that I need the color="#RRGGBB" value in the tag, but I need to drop anything else that appears in there, leaving just the font tag and color attribute and value. The other sticky point is that many people use single-quotes around the attributes, some use none, and a simple regex would have to be quite smart to figure this out (and likely rife with errors).

Doing this with exclusively regexes is going to be prone to failure, especially since tags can be improperly nested, so I can't just yank from <font .*?> to </font> and work on the remainder.

Here's an example of what my input could look like, and what I need for final output:

<font color="#000000" face="Arial,Helvetica" size="1"> Some text </font> <font color="#000000"> Some text </font>

Can any monk lend a hand?


In reply to Stripping font "face" values from font tags by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.