http://qs1969.pair.com?node_id=11148402


in reply to Re: WWW::Mechanize::Chrome VERY slow on xpath obtaining TDs of a TR
in thread WWW::Mechanize::Chrome VERY slow on xpath obtaining TDs of a TR

After adding HTML::Tree and parsing some stuff in pure Perl land I think that IS actually the right approach:

  1. Use W::M::Chrome for JS rendering, JS interactions and high-level xpath
  2. Slurp HTML chunks and process in the Perl side as much as possible

  • Comment on Re^2: WWW::Mechanize::Chrome VERY slow on xpath obtaining TDs of a TR

Replies are listed 'Best First'.
Re^3: WWW::Mechanize::Chrome VERY slow on xpath obtaining TDs of a TR (updated)
by LanX (Saint) on Nov 27, 2022 at 10:38 UTC
    That's one approach.

    But as I said I think putting the logic into a more elaborate xpath to do the heavy lifting inside the browser would fix your performance issue without needing HTML::Tree

    IMHO your code will force the Perl part in W:M:C to do a lot of own filtering and create thousands of proxy objects. These Perl objects will also tunnel requests back and forth to the browser for most method calls.

    Hence many potential bottlenecks.

    update

    as an illustration, this xpath in chrome's dev console for https://meta.wikimedia.org/wiki/Wikipedia_article_depth returns 1016 strings at once

    //table[3]//tr//td//text()

    Disclaimer: I don't have W:M:C installed and my xpath foo is rusted, so I'm pretty sure there are even better ways to do it.

    Cheers Rolf
    (addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
    Wikisyntax for the Monastery

      True.