in reply to Re^2: problem HTML::FormatText::WithLinks::AndTables
in thread problem HTML::FormatText::WithLinks::AndTables
HTML::HTML5::Parser parses the HTML into a DOM tree. It preserves all elements and all attributes. (The example I gave earlier showed filtering by the class="thead".)
Once the HTML is parsed, it's returned as an XML::LibXML::Document object, so you can manipulate it through object-oriented programming using more or less the same DOM API supported by desktop web browsers such as Internet Explorer, Firefox, Chrome, etc. Just using Perl instead of Javascript.
For example:
// Javascript var links = document.getElementsByTagName('a'); for (var i = 0; i < links.length; i++) { alert(links[i].href); }
# Perl my $document = HTML::HTML5::Parser->load_html(location => $url); my @links = $document->getElementsByTagName('a'); for (my $i = 0; $i < @links; $i++) { warn($links[$i]{href}); }
The majority of HTML parsing modules work along the same lines.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: problem HTML::FormatText::WithLinks::AndTables
by kevind0718 (Scribe) on Mar 12, 2013 at 00:48 UTC | |
by tobyink (Canon) on Mar 12, 2013 at 08:02 UTC |