Others have suggested HTML::LinkExtor. Here is a way to do it using HTML::TreeBuilder::XPath. Very handy if you need to extract other information from the file.
use HTML::TreeBuilder::XPath;
my $tree = HTML::TreeBuilder::XPath->new;
$tree->parse_file("/path/to/file.html");
$tree->eof;
my @links = $tree->findnodes('//a') ;
for my $link ( @links ){
print $link->attr('href'), "\n";
}
That will print every link. If you only want the links from the table then:
my @links = $tree->findnodes('//td/a') ;
for my $link ( @links ){
print $link->attr('href'), "\n";
}
Output:
/Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0
+001.txt
/Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0
+002.txt
/Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0
+003.txt
/Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0
+004.txt
/Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0
+005.txt
/Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0
+006.txt
/Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0
+007.txt
/Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0
+008.txt
/Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0
+009.txt
/Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0
+010.txt
/Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365.t
+xt
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|