Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re^2: WWW::Mechanize::Chrome VERY slow on xpath obtaining TDs of a TR

by LanX (Saint)
on Nov 25, 2022 at 13:32 UTC ( [id://11148379]=note: print w/replies, xml ) Need Help??


in reply to Re: WWW::Mechanize::Chrome VERY slow on xpath obtaining TDs of a TR
in thread WWW::Mechanize::Chrome VERY slow on xpath obtaining TDs of a TR

> If you don't need JavaScript

Even if ...

supposing communication overhead or an implementation loop are causing a bottleneck ...

... he could also try to fetch the whole table as html once using WWW::Mechanize::Chrome and do the parsing with Mojo::UserAgent

Cheers Rolf
(addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
Wikisyntax for the Monastery

  • Comment on Re^2: WWW::Mechanize::Chrome VERY slow on xpath obtaining TDs of a TR

Replies are listed 'Best First'.
Re^3: WWW::Mechanize::Chrome VERY slow on xpath obtaining TDs of a TR
by marto (Cardinal) on Nov 25, 2022 at 13:53 UTC

    " ... he could also try to fetch the whole table as html once using WWW::Mechanize::Chrome and do the parsing with Mojo::UserAgent"

    I've used this work around in the past for things that need special sign in or bounce back things that aren't being detected as a 'real' browser, purely so I don't have to do a lot of code changes :) As the location of the bottleneck is not yet understood this may not resolve the issue of performance.

      > As the location of the bottleneck is not yet understood this may not resolve the issue of performance.

      but may help narrowing down the underlying problem.

      Cheers Rolf
      (addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
      Wikisyntax for the Monastery

Re^3: WWW::Mechanize::Chrome VERY slow on xpath obtaining TDs of a TR
by ait (Hermit) on Nov 25, 2022 at 14:30 UTC

    I will try this, thank you!

    I noticed that fetching the TRs of the table seems pretty fast with WWW::Mechanize::Chrome and xpath. What's seems absurd is that fetching the TDs relative to a single TR takes so long, and the time is proportional to the number of total TRs. That doesn't make any sense unless there's a bug somewhere in WWW::Mechanize::Chrome xpath implementation.

      I can't look into it now, so some general advice

      • Try the Xpath inside the browser's dev console
      • Try logging what mechanize does under the hood.

      Back in the time when I used W:M:FF I was able (and sometimes needed) to send and eval JS and fetch the result as JSON.

      All this will help you specifying a feature request (if needed) for W:M:C

      HTH :)

      Cheers Rolf
      (addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
      Wikisyntax for the Monastery

        Thank you!
        I'm on a tight schedule to deliver so I think I'm going with extracting the HTML of the table with W::M::Chrome and parsing it with HTML-Tree
        I'll post my solution in the OP so that can help others in the future.

        If I get a good payday from this small gig, I'll look into the W::M::Chrome code myself and see if I can at least contribute in some way...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11148379]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (5)
As of 2024-04-25 14:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found