ww has asked for the wisdom of the Perl Monks concerning the following question:
SOLVED. See reply to self below
There perhaps should be a question mark in the title.
Caveat: Fri nite & Sat am brainlock? Maybe. But...
I'm trying to extract from the output of a linkchecker all chunks which report errors. My problem? This minimal test:
produces the expected results:#! /usr/bin/perl -w use strict; use 5.018; # test errmsg match # sample (and partial; see the chunking in the next code) errmsgs from + file: # Result</td><td bgcolor="#db4930">Error: 404....</td> ( or 301 etc.) # Result</td><td bgcolor="#db4930">Error: SSLError: [Errno 1] _ssl.c:5 +04:....</td> my $errmsg = qr[Result</td><td bgcolor=".{7}">Error:.*?(?=</td>)]; my @data_sample = ( '<tr><td bgcolor="#db4930">Result</td><td bgcolor="#db4930">Error: + 404 Not Found</td></tr>', '<tr><td bgcolor="#db4930">Result</td><td bgcolor="#db4930">Error: + SSLError: [Errno 1] _ssl.c:504: error:14090086:SSL routines:SSL3_GET +_SERVER_CERTIFICATE:certificate verify failed</td></tr>', '<tr><td foo bar baz> abcde </td></tr>' ); my $data_line; for $data_line(@data_sample) { if ( $data_line =~ /$errmsg/ ) { say "\t FOUND IT: $data_line \n"; } else { say "\t NO MATCH ON $data_line \n"; } }
C:\>test_err_finder.pl FOUND IT: <tr><td bgcolor="#db4930">Result</td><td bgcolor="# +db4930">Error: 404 Not Found</td></tr> FOUND IT: <tr><td bgcolor="#db4930">Result</td><td bgcolor="# +db4930">Error: SSLError: [Errno 1] _ssl.c:504: error:14090086:SSL rou +tines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed</td></tr> NO MATCH ON <tr><td foo bar baz> abcde </td></tr> C:\>
whereas, the selfsame ( $errmsg) regex here:
#!/usr/bin/perl -w use strict; use 5.018; # find linkchecker error reports in html report, linkchecker-out201511 +20.html $/ = '<table align="left" border="0" cellspacing="0" cellpadding="1"'; my ($fh, $item, @erritems); my $trterminator = qr[</tr>\n</table></td></tr></table>]; # errmsg from file: # Result</td><td bgcolor="#db4930">Error: 404....</td> ( or 301 etc.) # Result</td><td bgcolor="#db4930">Error: SSLError: [Errno 1] _ssl.c:5 +04:....</td> my $errmsg = qr[Result</td><td bgcolor=".{7}">Error:.*?(?=</td>)]; my $eot = qr[</small></body></html>]; open ($fh, "<", 'linkchecker-out20151120.html') or die "Can't open, $! +"; while (<$fh> ) { if ( $_ =~ /$eot/ ) { last; } else { $_ = <$fh>; $item = $_; $item =~ s/\n//gs; $item .= "\n\n"; } if ( $item =~ /$errmsg/ ) { push @erritems, $item; } } say "Errors id'ed in LinkChecker output, 'linkchecker-out20151120.html +'\n"; for $_(@erritems) { print $/; say $_; }
Many "print ($var);" debugging items have been removed here, but all point to consistency between the actual file contents and the minimal test above. Thus, contrary to the usual wise advice to include data, I'm omitting it for now, since even individual chucks run to about 0.5KB and even three samples (out of approximately 1000 chunks) would extend this verbose query to "TL,DR" status.
I'm hoping fresh eyes or greater wisdom will spot what I'm missing.
|
|---|