in reply to Re: can't extract node with HTML::TreeBuilder::XPath
in thread can't extract node with HTML::TreeBuilder::XPath

What does HTML::TreeBuilder do? Who knows!?

I KNOW! It tells you to read the source, how awful :)

htmltreexpather.pl works rather well to spit out xpaths that TreeBuilder::XPath will like :)

  • Comment on Re^2: can't extract node with HTML::TreeBuilder::XPath

Replies are listed 'Best First'.
Re^3: can't extract node with HTML::TreeBuilder::XPath
by saunderson (Novice) on Jul 30, 2012 at 11:27 UTC

    What does HTML::TreeBuilder do? Who knows!?

    I KNOW! It tells you to read the source, how awful :)
    I second that. A specs compatible HTML::TreeBuilder::XPath that works with the xpaths extracted with a common browser would definitely a simplification....

      I second that. A specs compatible HTML::TreeBuilder::XPath that works with the xpaths extracted with a common browser would definitely a simplification....

      I was being sarcastic :) HTML::HTML5::Parser isn't documented much better than HTML::TreeBuilder -- you have to read the source just the same

      FYI, HTML::TreeBuilder::Xpath just tacks on an xpath-1 engine onto a TreeBuilder tree -- common browser addons commonly modify the DOM --- its usually only @class and @id attributes you're interested in , not absolute paths

      htmltreexpather.pl works with the actual tree that HTML::TreeBuilder builds, no browser required :)

        Or you could read the HTML5 specification which it almost perfectly complies with. That's the whole point of it - it doesn't need to document how it parses HTML, because it parses it per spec, and the same way as almost every modern browser.

        perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'