Re^3: can't extract node with HTML::TreeBuilder::XPath

Replies are listed 'Best First'.
Re^4: can't extract node with HTML::TreeBuilder::XPath by Anonymous Monk on Aug 01, 2012 at 03:34 UTC
I second that. A specs compatible HTML::TreeBuilder::XPath that works with the xpaths extracted with a common browser would definitely a simplification.... I was being sarcastic :) HTML::HTML5::Parser isn't documented much better than HTML::TreeBuilder -- you have to read the source just the same FYI, HTML::TreeBuilder::Xpath just tacks on an xpath-1 engine onto a TreeBuilder tree -- common browser addons commonly modify the DOM --- its usually only @class and @id attributes you're interested in , not absolute paths htmltreexpather.pl works with the actual tree that HTML::TreeBuilder builds, no browser required :)	[reply]
Re^5: can't extract node with HTML::TreeBuilder::XPath by tobyink (Canon) on Aug 01, 2012 at 06:35 UTC
Or you could read the HTML5 specification which it almost perfectly complies with. That's the whole point of it - it doesn't need to document how it parses HTML, because it parses it per spec, and the same way as almost every modern browser. `perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'`	[reply]
Re^6: can't extract node with HTML::TreeBuilder::XPath by Anonymous Monk on Aug 01, 2012 at 07:15 UTC
Or you could read the HTML5 specification which it almost perfectly complies with. That's the whole point of it - it doesn't need to document how it parses HTML, because it parses it per spec, and the same way as almost every modern browser. How could anyone know to read that? Because you mention it here on perlmonks? The only way to even get a hint that it compiles with some html5 spec is to read the source -- the only mention in the documentation is where "foobar" is not a real HTML element name (as found in the HTML5 spec) -- in short, nowhere in your module documentation do you actually tell anyone go read w3.... for the algorithm	[reply]
Re^7: can't extract node with HTML::TreeBuilder::XPath by tobyink (Canon) on Aug 01, 2012 at 10:04 UTC
Re^8: can't extract node with HTML::TreeBuilder::XPath by Anonymous Monk on Aug 01, 2012 at 11:00 UTC