Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

WWW::Mechanize::Firefox latency using RemoteObject

by hansendc (Novice)
on Apr 03, 2011 at 16:32 UTC ( [id://897212]=perlquestion: print w/replies, xml ) Need Help??

hansendc has asked for the wisdom of the Perl Monks concerning the following question:

I have some code basically doing this:
$mech->get($url); foreach my $form ($mech->forms()) { print "<form "...; foreach my $input ($form->{elements}) { print "<input..." foreach my $attr ($input->{attributes}) { print ... } } }
I'm using an existing form, and filtering some bits of it out, and spitting out HTML which is approximately the same on the other end. The _real_ page is too slow to load for me, so I prefer to just load my fake form and submit _it_.

This worked OK with plain ol' WWW::Mechanize. But, Mechanize::Firefox takes many seconds to do the same thing. It isn't CPU bound, though. There just seem to be a lot of transactions back and forth with the browser when I do this. I assume that each of the $foo->{bar} accesses is getting morphed in to a few calls in to the browser, and they add up to be significant.

Any thoughts on reducing the number of transactions, reducing the latency, or quicker ways to do this?

Well, I figured it out. Short story, the kernel is holding on to small amounts of data trying to increase throughput overall. More details here: http://groups.google.com/group/mozlab/browse_thread/thread/f7389b14bc1426ad?hl=en

Replies are listed 'Best First'.
Re: WWW::Mechanize::Firefox latency using RemoteObject
by Corion (Patriarch) on Apr 03, 2011 at 16:46 UTC

    You can at least avoid some of the loops by using ->xpath queries to find relevant elements. For example to find all INPUT elements for a given form, you can use

    my @inputs = $mech->selector( 'input,select,textarea', node => $form ) +;

    There is no convenient way to do bulk-fetching for attributes, so you're basically stuck there.

    If all you want is the HTML of elements, you can just print $element->{innerHTML}

      OK, that ->selector() syntax looks interesting.

      Do you see what I'm saying about how long each of the transactions to the browser and back takes, though? I'm curious if you have any thoughts on what's causing that bit. As I said, it seems a wee bit odd that the system is idle instead of cpu or I/O bound.

        This is just as you suspected - each attribute access involves at least one roundtrip over TCP from Perl to Firefox and back. There is little you can do except avoid accessing Javascript data from Perl, or to make bulk requests.

        Likely, the CPU time gets split up between Firefox and Perl and the kernel for the Network, and I'm not sure how your monitoring accounts for time spent in the kernel.

        Update: If you start to optimize your application and try to reduce the accesses to Javascript objects, the object bridge has some counters that might be of help:

        stats => { roundtrip => 0, # total number of roundtrips fetch => 0, # number of attribute fetches store => 0, # number of attribute stores callback => 0, # number of callbacks triggered },
        use Data::Dumper; my $repl = $mech->repl; warn Dumper $repl->{stats};
Re: WWW::Mechanize::Firefox latency using RemoteObject
by Khen1950fx (Canon) on Apr 03, 2011 at 17:13 UTC
    You could speed things up like this:
    #!/usr/bin/perl use strict; use warnings; use WWW::Mechanize::Firefox; my $mechanize = WWW::Mechanize::Firefox->new( stack_depth => 0);
    Mechanize keeps a history of every page with its state; also, it makes a clone of each Mech object. If set to 0, it won't keep that history. You can adjust it to meet your needs.
Re: WWW::Mechanize::Firefox latency using RemoteObject
by spx2 (Deacon) on Apr 04, 2011 at 07:29 UTC
    • disable all addons you don't need in the browser
    • disable images
    • disable javascript

    you do that in Edit->Preferences->Content_tab

    alternatively, you can consider studying the requests Firefox does for a login on that site using Firebug and replicating all those requests using LWP for example using the appopriate headers and GET/POST parameters.

      That won't help much because the slowness likely comes from Perl talking to Firefox and back, and not from pages being slow to load.

        well maybe there's some js doing some AJAX requests so that's why I said to disable that js. or maybe there are some ads loading .. which can also be avoided..

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://897212]
Approved by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (4)
As of 2024-04-19 14:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found