mascip has asked for the wisdom of the Perl Monks concerning the following question:

XPath queries are "fairly slow" (compared to running around in a Tree for example), and i read here that XSH could possibly make them a lot faster.

Explanation by Randal L. Schwartz :
"The XSH version (even considering the compilation of the XSH language into Perl before running) takes about one seventh the time of the HTX (HTTP::TreeBuilder::XPath) version.
I was surprised at the difference, so I did a bit of exploration and found that that the HTX version was spending almost all of its time building the huge Perl data structure to represent the DOM. Because XSH doesn't need to do that (the DOM is in C-side data structures), we get a tremendous savings in time, not to mention quicker queries later."

Would it be possible to get this speed by "plugging" (simple words for simple understanding) WWW::Mechanize::Firefox and XSH ?

Thank you =o)

PS : I guess my alternative solution would be to use Trees most of the time, and XPath only when i really need it. For example for retrieving an element's position on the screen : 951038

  • Comment on Naive idea : using XSH with WWW::Mechanize::Firefox, to make faster XPath queries ?

Replies are listed 'Best First'.
Re: Naive idea : using XSH with WWW::Mechanize::Firefox, to make faster XPath queries ?
by Anonymous Monk on Feb 06, 2012 at 11:26 UTC

    No, adding another html/xml parser, on top of mozilla/gecko, will only slow things down, and increase memory usage

    XSH derives its speed from using libxml, a c library, and avoiding creating elements in perl

      Understood, thank you !
      I guess this node could be deleted, then.

Re: Naive idea : using XSH with WWW::Mechanize::Firefox, to make faster XPath queries ?
by Anonymous Monk on Feb 06, 2012 at 11:28 UTC

      Thanks for pointing at this library.
      But then i guess that, as this HTML::TreeBuilder::LibXML library seems to parse only the HTML content, i lose the advantage of WWW::Mechanize::Firefox, which is to take Javascript into consideration.

      Just for the sake of curiosity, the Tree look_downs() and the Xpath queries seem to then parse exactly the same information here (the HTML source code); i am wondering whether there is any advantage in using XPath queries rather than Tree's look_down() ? Is it not exactly equivalent ? I guess the XPath queries take less lines to write, though.
      Please tell me when i'm wrong :o)

        i lose the advantage of WWW::Mechanize::Firefox
        You don't. Navigate with Mechanize, parse/query with TreeBuilder.