Hello monks!

I'm a new user to this site, and have only been perl-ing for about two months, so please bear with me. Big picture, I'm writing a script for work which will take data from our webpage and generate gnuplot bar graphs. Small picture, I've found that there are blank rows in some of the HTML tables (way too much red tape to implement changes to the FORTRAN program that makes the HTML), and now need to change my regular expressions to catch those blank rows without catching single blank cells. I am not allowed to install modules (like HTML::TableExtract). I've pulled the html code into text files, and have regexes that pull out the necessary bits, but they get all fouled up on the occasional table that includes a blank line. Here is a sample of the text which I am searching:

<TR VALIGN="TOP"><TD><FONT SIZE="-1"> * </FONT></TD> <TD><FONT SIZE="-1"> MHS </FONT></TD> <TD><FONT SIZE="-1">125370</FONT></TD> <TD><FONT SIZE="-1">129114</FONT></TD> <TD><FONT SIZE="-1">131645</FONT></TD> <TD><FONT SIZE="-1">129546</FONT></TD> <TD><FONT SIZE="-1">515675</FONT></TD></TR> <TR VALIGN="TOP"><TD><FONT SIZE="-1"> * </FONT></TD> <TD><FONT SIZE="-1"> AIRS </FONT></TD> <TD><FONT SIZE="-1">626462</FONT></TD> <TD><FONT SIZE="-1">567621</FONT></TD> <TD><FONT SIZE="-1">614791</FONT></TD> <TD><FONT SIZE="-1">574009</FONT></TD> <TD><FONT SIZE="-1">2382883</FONT></TD></TR> <TR VALIGN="TOP"><TD><FONT SIZE="-1"></FONT></TD> <TD><FONT SIZE="-1"></FONT></TD> <TD><FONT SIZE="-1"></FONT></TD> <TD><FONT SIZE="-1"></FONT></TD> <TD><FONT SIZE="-1"></FONT></TD> <TD><FONT SIZE="-1"></FONT></TD> <TD><FONT SIZE="-1"></FONT></TD></TR>

I want to catch the third of those chunks. The following are some of the regexes I've tried--all of which (I think) should be equivalent, for all intents and purposes. I apologize for their ugliness. But they don't pull out the section I need, so:

while (<FILE>) { if( m#(<TR VALIGN="TOP"><TD></FONT></TD>.*</TR>)#sg ){ push(@search1, $+); } if( m#(<TR VALIGN="TOP"><TD><FONT SIZE="-1"></FONT></TD> <TD><FONT SIZE="-1"></FONT></TD> <TD><FONT SIZE="-1"></FONT></TD> <TD><FONT SIZE="-1"></FONT></TD> <TD><FONT SIZE="-1"></FONT></TD> <TD><FONT SIZE="-1"></FONT></TD> <TD><FONT SIZE="-1"></FONT></TD></TR>)#mg ){ push(@search2, $+); } if( m#(<TR VALIGN="TOP"><TD><FONT SIZE="-1"></FONT></TD>\s*\n* +\s*<TD><FONT SIZE="-1"></FONT></TD>\s*\n*\s*<TD><FONT SIZE="-1"></FON +T></TD>\s*\n*\s*<TD><FONT SIZE="-1"></FONT></TD>\s*\n*\s*<TD><FONT SI +ZE="-1"></FONT></TD>\s*\n*\s*<TD><FONT SIZE="-1"></FONT></TD>\s*\n*\s +*\n*\s*<TD><FONT SIZE="-1"></FONT></TD></TR>)#mg ){ push(@search3, $+); } if( m#(<TR VALIGN="TOP"><TD><FONT SIZE="-1"></FONT></TD>.*<TD> +<FONT SIZE="-1"></FONT></TD>.*<TD><FONT SIZE="-1"></FONT></TD>.*<TD>< +FONT SIZE="-1"></FONT></TD>.*<TD><FONT SIZE="-1"></FONT></TD>.*<TD><F +ONT SIZE="-1"></FONT></TD>.*<TD><FONT SIZE="-1"></FONT></TD></TR>)#sg + ){ push(@search4, $+); } }

I've tried variations on //m and //s, but it still doesn't catch. I would deeply appreciate any suggestions for a solution or revelations as to why I am wrong. Thanks in advance,

elvenwonder


In reply to Specific Regex with Multilines (/s and /m): Why Doesn't This Work? by elvenwonder

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.