in reply to Re^2: can't extract node with HTML::TreeBuilder::XPath
in thread can't extract node with HTML::TreeBuilder::XPath

What does HTML::TreeBuilder do? Who knows!?

I KNOW! It tells you to read the source, how awful :)
I second that. A specs compatible HTML::TreeBuilder::XPath that works with the xpaths extracted with a common browser would definitely a simplification....
  • Comment on Re^3: can't extract node with HTML::TreeBuilder::XPath

Replies are listed 'Best First'.
Re^4: can't extract node with HTML::TreeBuilder::XPath
by Anonymous Monk on Aug 01, 2012 at 03:34 UTC

    I second that. A specs compatible HTML::TreeBuilder::XPath that works with the xpaths extracted with a common browser would definitely a simplification....

    I was being sarcastic :) HTML::HTML5::Parser isn't documented much better than HTML::TreeBuilder -- you have to read the source just the same

    FYI, HTML::TreeBuilder::Xpath just tacks on an xpath-1 engine onto a TreeBuilder tree -- common browser addons commonly modify the DOM --- its usually only @class and @id attributes you're interested in , not absolute paths

    htmltreexpather.pl works with the actual tree that HTML::TreeBuilder builds, no browser required :)

      Or you could read the HTML5 specification which it almost perfectly complies with. That's the whole point of it - it doesn't need to document how it parses HTML, because it parses it per spec, and the same way as almost every modern browser.

      perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'

        Or you could read the HTML5 specification which it almost perfectly complies with. That's the whole point of it - it doesn't need to document how it parses HTML, because it parses it per spec, and the same way as almost every modern browser.

        How could anyone know to read that? Because you mention it here on perlmonks? The only way to even get a hint that it compiles with some html5 spec is to read the source -- the only mention in the documentation is where "foobar" is not a real HTML element name (as found in the HTML5 spec) -- in short, nowhere in your module documentation do you actually tell anyone go read w3.... for the algorithm