Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re^2: WWW::Mechanize::Chrome VERY slow on xpath obtaining TDs of a TR

by ait (Hermit)
on Nov 27, 2022 at 10:27 UTC ( [id://11148402]=note: print w/replies, xml ) Need Help??


in reply to Re: WWW::Mechanize::Chrome VERY slow on xpath obtaining TDs of a TR
in thread WWW::Mechanize::Chrome VERY slow on xpath obtaining TDs of a TR

After adding HTML::Tree and parsing some stuff in pure Perl land I think that IS actually the right approach:

  1. Use W::M::Chrome for JS rendering, JS interactions and high-level xpath
  2. Slurp HTML chunks and process in the Perl side as much as possible

Replies are listed 'Best First'.
Re^3: WWW::Mechanize::Chrome VERY slow on xpath obtaining TDs of a TR (updated)
by LanX (Saint) on Nov 27, 2022 at 10:38 UTC
    That's one approach.

    But as I said I think putting the logic into a more elaborate xpath to do the heavy lifting inside the browser would fix your performance issue without needing HTML::Tree

    IMHO your code will force the Perl part in W:M:C to do a lot of own filtering and create thousands of proxy objects. These Perl objects will also tunnel requests back and forth to the browser for most method calls.

    Hence many potential bottlenecks.

    update

    as an illustration, this xpath in chrome's dev console for https://meta.wikimedia.org/wiki/Wikipedia_article_depth returns 1016 strings at once

    //table[3]//tr//td//text()

    Disclaimer: I don't have W:M:C installed and my xpath foo is rusted, so I'm pretty sure there are even better ways to do it.

    Cheers Rolf
    (addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
    Wikisyntax for the Monastery

      True.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11148402]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (9)
As of 2024-03-28 09:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found