comment on

"There still isn't one single package that does XSLT 2.0"

There's XML::Saxon::XSLT2 (again, I'm the developer of it). It's a Perl wrapper around the Java Saxon library, using Inline::Java. It's a bit of a pain to install, and the interface between Java and Perl has a potential to be flaky, but right now it's your only option if you need XSLT 2.0 in Perl.

I'd love to see some competitors to it spring up, I really would. The only reason I wrote it is because there was literally no other choice in Perl for XSLT 2.0; not out of a love for Java programming. ;-)

"I do not want to have a war between the monks, but please enlighten me more on why to use HTML5 instead of TreeBuilder"

Two main reasons:

If you want to use XML::LibXML, which as I say is a very good DOM implementation (with XPath, XML Schema, Relax NG, etc) then HTML::HTML5::Parser integrates with it out of the box.
It follows the parsing algorithm from the W3C HTML5 working drafts, allowing it to deal with tag soup in much the same way as desktop browsers do. (It currently passes the majority of the html5lib test suite. html5lib is an HTML parsing library for Python and Ruby, and is pretty much the de facto reference implementation of the HTML5 parsing algorithm.) If you wish to deal with random content off the Web, that's kinda important, because there are an awful lot more people who test their content in desktop browsers than test it in HTML::TreeBuilder.

A practical example. Check out the following piece of HTML in a desktop web browser. Note that (somewhat counter-intuitively) the paragraph containing the emphasised text is rendered above the "Hello World" greeting.
```
<table>
<tr><td>Hello World</td></tr>
<p>This will be rendered <em>before</em> the greeting.</p>
</table>
[download]
```
Now run this test script:
```
use 5.010;
use HTML::TreeBuilder;
use HTML::HTML5::Parser;

my $string = do { local $/ = <DATA> }; # slurp

say "HTML::HTML5::Parser...";
say HTML::HTML5::Parser
    -> load_html(string => $string)
    -> textContent;

say "HTML::TreeBuilder...";
say HTML::TreeBuilder
    -> new_from_content($string)
    -> as_text;

__DATA__
<table>
<tr><td>Hello World</td></tr>
<p>This will be rendered <em>before</em> the greeting.</p>
</table>
[download]
```
Note that HTML::HTML5::Parser returns the content in the same order as your web browser; HTML::TreeBuilder does not.

That said, there are plenty of good things about HTML::TreeBuilder too; and if neither of the above apply to you, then it's a good option. It's stable, mature and well-understood by many Perl programmers. I don't really have anything bad to say about it.

perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'

In reply to Re^3: extracting data from HTML by tobyink
in thread extracting data from HTML by Jurassic Monk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.