Hello All, I'm trying to simplify a reg. ex. to extract currencies from HTML files. Here is a snippet of the HTML that wraps the currency and f/x rate:
<TR BGCOLOR="#F0F0DF"> <TD ALIGN=LEFT VALIGN=TOP CLASS="mrktdata1"><FONT FACE="arial, +helvetica" SIZE="-1" COLOR="#000000">Falkland Island Pound (*FKP)</FO +NT></TD> <TD ALIGN=RIGHT VALIGN=TOP CLASS="mrktdata1"><FONT FACE="arial +,helvetica" SIZE="-1" COLOR="#000000">1.4409 </FONT></TD> </TR></TR>
Here's the expression I'm currently using to parse out the currencies:
while ($content =~ /(\**\s?\w+\.?\s+\S*\.?\s*\w*\.?\/?\s*\w*\.?\/?\s*\ +w*\.?\/?\s*\(\*?[A-Z]{2,4}\))/g){ print RAW "$1\n";
Some sample output:
Falkland Island Pound (*FKP) South African Rand/fin (ZAR)
It works, but as you can see my expression is rather cumbersome. Any ideas on how I can somehow group things and/or condense or simplify it? I've tried using variations of \S to catch instances of odd things like '&' or '/', but that really slows it down. Is there someway I can group all the repeating sections? I've attempted a couple grouping schemes and tried putting it into a class, but without much success. I'm also trying to keep it somewhat generic and flexible to catch name changes, etc.

Finally, is there a nice way I can also read in the rate (1.4409 in the sample) on the same pass through the file? Thanks.

In reply to simplifying an expression by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.