Hello ww,

The $errmsg regex is working fine, as the following demonstrates:

#! perl -w use strict; use 5.018; my $start = qr{<table align="left" border="0" cellspacing="0" cellpad +ding="1"}; my $end = qr{</table>}; my $errmsg = qr{Result</td><td bgcolor=".{7}">Error:.*?(?=</td>)}; while (<DATA>) { /$errmsg/ && say if /$start/ .. /$end/; } __DATA__ <html><body><small> <table align="left" border="0" cellspacing="0" cellpadding="1"> <tr><td bgcolor="#db4930">Result</td><td bgcolor="#db4930">Error: 404 +Not Found</td></tr> </table> <table align="left" border="0" cellspacing="0" cellpadding="1"> <tr><td bgcolor="#db4930">Result</td><td bgcolor="#db4930">Error: SSLE +rror: [Errno 1] _ssl.c:504: error:14090086:SSL routines:SSL3_GET_SERV +ER_CERTIFICATE:certificate verify failed</td></tr> </table> <table align="left" border="0" cellspacing="0" cellpadding="1"> <tr><td foo bar baz> abcde </td></tr> </table> <tr><td bgcolor="INVALID">Result</td><td bgcolor="#db4930">Error: 404 +Not Found</td></tr> </small></body></html>

Output:

13:55 >perl 1458_SoPW.pl <tr><td bgcolor="#db4930">Result</td><td bgcolor="#db4930">Error: 404 +Not Found</td></tr> <tr><td bgcolor="#db4930">Result</td><td bgcolor="#db4930">Error: SSLE +rror: [Errno 1] _ssl.c:504: error:14090086:SSL routines:SSL3_GET_SERV +ER_CERTIFICATE:certificate verify failed</td></tr> 13:55 >

The main limitation of the above approach is that it fails to handle nested tables.

As AnomalousMonk and tye have indicated, the problem almost certainly lies in the logic used to split the input into “paragraphs.” If I were debugging this, I’d begin by printing out the value of $item immediately before the line if ( $item =~ /$errmsg/ ) {.

Hope that helps,

Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,


In reply to Re: Example of inconsistent regex matching by Athanasius
in thread Example of brainfog (Was: inconsistent regex matching) by ww

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.