HTML::HTML5::Parser parses the HTML into a DOM tree. It preserves all elements and all attributes. (The example I gave earlier showed filtering by the class="thead".)
Once the HTML is parsed, it's returned as an XML::LibXML::Document object, so you can manipulate it through object-oriented programming using more or less the same DOM API supported by desktop web browsers such as Internet Explorer, Firefox, Chrome, etc. Just using Perl instead of Javascript.
For example:
// Javascript var links = document.getElementsByTagName('a'); for (var i = 0; i < links.length; i++) { alert(links[i].href); }
# Perl my $document = HTML::HTML5::Parser->load_html(location => $url); my @links = $document->getElementsByTagName('a'); for (my $i = 0; $i < @links; $i++) { warn($links[$i]{href}); }
The majority of HTML parsing modules work along the same lines.
In reply to Re^3: problem HTML::FormatText::WithLinks::AndTables
by tobyink
in thread problem HTML::FormatText::WithLinks::AndTables
by kevind0718
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |