in reply to Re^7: WWW::Mechanize::Firefox delayed returns / slow
in thread WWW::Mechanize::Firefox delayed returns / slow

Thank a lot Corion! For future reference, I will explain how your solution solved my problem (4 years later).

I am scraping a page which makes ->content() sooo slow. I tried to replace it with

my $sel = ->selector('body', single => 1); my $content = $sel->{innerHTML};
and that second line (a simple assignment) was still as slow. I copied $content to a file, which weighs almost 1MB. It doesn't seem that big. Is 1MB too big for an assignment?

Anyway, I chose a more restrictive selector and now it's blazing fast. If you come across the same problem, this solution might save you too.

Replies are listed 'Best First'.
Re^9: WWW::Mechanize::Firefox delayed returns / slow
by Corion (Patriarch) on Apr 09, 2014 at 17:26 UTC

    Thank you very much for sharing how you made your script faster!

    That "simple assignment" is not so simple behind the scenes. It involves converting the DOM in the Firefox process to a text string and then copying that text string to the Perl process. The network connection between Firefox and Perl is not as fast as it could be, unfortunately.

      I was assuming that ->{innerHTML} was already a string, but you're saying that it's an object, and it's the stringification that takes a long time? But... I didn't ask for it to be stringified yet.

        No - all properties are just tied wrappers for Javascript objects living on the other side in Firefox. Every access and function call has to travel from Perl to Firefox and back, and that is slow.

        If you are interested in the nasty details, I think that MozRepl::RemoteObject somewhat goes into the nasty details, and I gave one talk on the subject, albeit in German only. I'm not sure if the Google translation of the slides are usable...