WWW::Mechanize::Firefox

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Morning, O Multisplendiferous Monks of Much Munging! I humbly come before thee with a question...

So I'm trying out WWW::Mechanize::Firefox. Very, very cool module - the author ought to be decorated and lauded to the skies. Wow. Awesome. I'm having lots of fun, and doing things with it I never thought you could do on the Web - jumpin' JavaScript, I can mechanize JS-generated logins, too cool for words!

But, there's one piece of it that I just can't figure out. Here's the setup: I want to be able to log into a site (yep, I have the credentials), grab a generated string off that first page, then go back and repeat the process 6 more times. However, since it takes a second or two to log in - or more, depending on the connection - the process occasionally fails because I tried to grab the content too fast. Here's what my code looks like right now:

#!/usr/bin/perl -w
use strict;

die "Usage: ", $0 =~ /([^\/]+)$/, " <URL>\n" unless @ARGV;

use WWW::Mechanize::Firefox;
my $mech = WWW::Mechanize::Firefox->new();
$mech->get($ARGV[0]);
$mech->activateTab();

my($user, $pass) = qw{SHHH_its_a_secret Yeah_sure_whatever};

for (1..7){
    $mech->submit_form(
        with_fields => {
            username    => $user,
            password    => $pass,
        }
    );

    sleep 1;

    if ($mech->content =~ /of the day is <br>(.*)/){
        print "$_ => '$1'\n";
    }
    $mech->back;
}
[download]

What I'd really like to do is replace that 'sleep 1' with some sort of a gadget that tells me that the login went through - i.e., the page finished loading - and so now I can go ahead and scrape it. The docs show a 'progress_listener' method that's supposed to "set up the callbacks for the ``nsIWebProgressListener'' interface to be the Perl subroutines you pass in" - but honest to goodness, I must be too stupid to figure it out. I've tried setting it up just the way it's shown in the docs, with the only difference being the sub that I call:

        my $eventlistener = $mech -> progress_listener(
            $browser,
            onLocationChange => \&process_content,
        );
[download]

where 'process_content' is just that last 'if' block, and it does nothing - no errors, no warnings, just no output. Never gets called, as far as I can tell.

So, any advice on setting up this callback thing - or maybe an explanation of why it's not the right thing, and what the right thing is - would be really appreciated! Thanks to all for listening, and have a great POSIX::strftime("%A",localtime) !

Comment on WWW::Mechanize::Firefox - callbacks? Select or Download Code

Replies are listed 'Best First'.
Re: WWW::Mechanize::Firefox - callbacks? by Khen1950fx (Canon) on Aug 28, 2010 at 23:48 UTC
Take a look at WWW::Mechanize::Firefox::Examples. I believe urlbar.pl sums it up perfectly.	[reply]
Re: WWW::Mechanize::Firefox - callbacks? by dmz (Novice) on Aug 30, 2010 at 17:13 UTC
This works for vanilla WWW::Mechanize IIRC Firefox and InternetExplorer both mirror the methods in WWW::Mechanize When you initialize the scraper put a timeout=>15 in there. eg `WWW::Mechanize::new(timeout=>15);` [download] Checking for login or any other mech actions you can use the preferred method of waiting until the action is complete. This works well for sites that don't timeout or go dead with too many requests. `wait until $mech->success;` [download] or, a more robust version can wait for success and handle if you get non-success: `wait until $muck->success or $muck->status; if ($muck->status ne 200 ) //200 is HTML success { // do something } else { //do something else handle errors, sleep if page timeouts, recurse to + try again, etc }` [download] I socked the latter into a sub passing the action and on fail recurse into the same sub until I get success.	[reply] [d/l] [select]
Re: WWW::Mechanize::Firefox - callbacks? by Corion (Patriarch) on Sep 01, 2010 at 20:41 UTC
See WWW::Mechanize::Firefox::Examples, on Wait until an HTML Element appears. There is little need for you to utilize an nsiProgressListener, because WWW::Mechanize::Firefox already waits for the DOMContentLoaded event before it returns. I assume that the content you need to scrape is generated later on, by Javascript.	[reply]
Re: WWW::Mechanize::Firefox - callbacks? by jbernest (Novice) on Apr 30, 2013 at 23:23 UTC
This strategy works for me. Determine what content will be visible on the webpage only if the login succeeds. Then repeat the login step until that content is visible. Once it is visible, you know the login succeeded and you can do stuff on the next page. Plug in (1) your own url here, (2) whatever content should be visible only when the login succeeds, and (3) your lines of code you want to use once the login succeeds. Let me know if this is useful. `use WWW::Mechanize::Firefox; my $mech = WWW::Mechanize::Firefox->new(autoclose => 0); $mech -> get("your own url"); my $count = 0; my $retries = 100; while ($retries-- and ! $mech->is_visible( xpath => '//*[@value="your +own content"]' )) { $count++; print "count: $count\n"; sleep 1; #Do your login stuff here. if ($mech -> success()) { #Do stuff only once the login step succeeds. } } die "Timeout" unless $retries;` [download]	[reply] [d/l]