Corion's suggestions are most helpful, especially encouraging using modules to grab the section you need.

You're almost there in coding it yourself, but be aware (and I suspect you are, but it's worth mentioning) that the pattern matching may break the capture if the site's html structure changes.

Given that the matching may eventually break (although the page's structure is consistent), consider the following:

use Modern::Perl; use LWP::Simple 'get'; my $vendor = 'Rockwell Automation'; my $search_start = "<p><b>$vendor</b><br />"; my $search_end = '<p>&nbsp;</p>'; my $url = 'http://www.us-cert.gov/control_systems/ics-cert/archive.htm +l'; my $html = get $url; my ($section) = $html =~ /(\Q$search_start\E.*?\Q$search_end\E)/s; print $section;

Output:

<p><b>Rockwell Automation</b><br /> Rockwell Automation ControlLogix Multiple PLC Vulnerabilities (UPDATE) +, <a href="/control_systems/pdf/ICS-Alert-12-020-02A.pdf">ICS-ALERT-12-0 +20-02A</a> (February 14, 2012)</p> <p>Rockwell Automation ControlLogix PLC Multiple Vulnerabilities, <a href="/control_systems/pdf/ICS-Alert-12-020-02.pdf">ICS-ALERT-12-02 +0-02</a> (January 20, 2012)</p> <p>Rockwell Automation FactoryTalk RNADiagReceiver, <a href="/control_systems/pdf/ICSA-12-088-01.pdf">ICSA-12-088-01</a> (March 28, 2012)</p> <p>Rockwell Automation FactoryTalk RNADiagReceiver (UPDATE), <a href="/control_systems/pdf/ICSA-12-088-01A.pdf">ICSA-12-088-01A</a> (April 06, 2012)</p> <p>Rockwell Automation FactoryTalk RNADiagReceiver, <a href="/control_systems/pdf/ICS-ALERT-12-017-01.pdf">ICS-ALERT-12-01 +7-01</a> (January 17, 2012)</p> <p>Rockwell FactoryTalk Diag Viewer Memory Corruption, <a href="/control_systems/pdf/ICSA-11-175-01.pdf">ICSA-11-175-01</a> (June 24, 2011)</p> <p>Rockwell-PLC5, <a href="/control_systems/pdf/ICSA-10-070-02.pdf">ICSA-10-070-02</a> (March 11, 2010)</p> <p>Rockwell RSLinx EDS, <a href="/control_systems/pdf/ICSA-11-161-01.pdf">ICSA-11-161-01</a> (June 10, 2011)</p> <p>Rockwell RSLogix, <a href="/control_systems/pdf/ICS-ALERT-11-256-05.pdf">ICS-ALERT-11-25 +6-05</a> (September 13, 2011)</p> <p>Rockwell RSLogix (UPDATE), <a href="/control_systems/pdf/ICS-ALERT-11-256-05A.pdf">ICS-ALERT-11-2 +56-05A</a>&nbsp; (September 19, 2011)</p> <p>Rockwell RSLogix Denial-of-Service Vulnerability, <a href="/control_systems/pdf/ICSA-11-273-03.pdf">ICSA-11-273-03</a> (September 30, 2011)</p> <p>Rockwell RSLogix Denial-of-Service Vulnerability (UPDATE), <a href="/control_systems/pdf/ICSA-11-273-03A.pdf">ICSA-11-273-03A</a> + (October 06, 2011)</p> <p>RSLinx, <a href="/control_systems/pdf/ICSA-10-070-01.pdf">ICSA-10-070-01</a> (March 11, 2010)</p> <p>RSLinx (UPDATE), <a href="/control_systems/pdf/ICSA-10-070-01A.pdf">ICSA-10-070-01A</a> + (May 03, 2010)</p> <p>&nbsp;</p>

Hope this helps!

Update: Strings in regex now quoted. Thanks, aitap, for the suggestion.


In reply to Re: Parsing and searching HTML code by Kenosis
in thread Parsing and searching HTML code by jayto

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.