I was recently perusing my (sparse!) posting history on this site and had some fond memories triggered by this script I had written almost ten years ago. In the ten years since then screen scraping and web automation have made some great advances! WWW::Mechanize was a huge advance by providing such a programmer friendly API. More recently WWW:Scripter, in conjunction with plugin scripting engines, has essentially given Perl programmers a pure Perl "virtual browser" in which to data mine even the most horrendously JavaScript laden site! Just today I wrote the following code to download quotes for a portfolio of stocks on Google Finance. Now, I didn't really need to use WWW::Scripter. I could have used WWW::Mechanize to achieve the same thing, of course, but I wanted to experiment with WWW::Scripter a little. Having my script identify itself as Safari and the random sleeps between actions are just small ways to act more like a real human user logging in and clicking through. Again, no real reason other than to just have some fun. The fact remains, though, that if I streamlined this script I could have accomplished the act of logging into Google and clicking through to download the quotes in about a dozen lines of code! Furthermore, if Google decided to check to see if they were getting hit by a "real browser" with JavaScript checks a Perl programmer could still get the job done!
#download_quotes.pl #Downloads portfolio quotes in .csv format from Google Finance. #Adam Russell(ac.russell@live.com) #27 February 2011 use strict; use warnings; use WWW::Scripter; my $w = new WWW::Scripter; $w->agent_alias("Mac Safari"); $w->use_plugin("JavaScript"); $w->get("https://www.google.com/accounts/Login"); sleep_rand(); $w->form_name("gaia_loginform"); $w->field("Email",GOOGLE_LOGIN); $w->field("Passwd", GOOGLE_PASSWORD); $w->current_form->trigger_event("submit"); sleep_rand(); $w->get("http://www.google.com/finance"); sleep_rand(); $w->follow_link(id=>"nav-p"); sleep_rand(); $w->follow_link(id=>"nav-pf"); sleep_rand(); $w->follow_link(id=>"download"); print $w->content; sub sleep_rand{ sleep(rand()*3); }
<jc> Why do people persist in asking me stupid questions?
<Petruchio> <insert mutually recursive response>
--an exchange from #perlmonks on irc.slashnet.org(2 March 2009 1345 EST)

Replies are listed 'Best First'.
Re: Thinking about advances in web automation
by locked_user sundialsvc4 (Abbot) on Feb 28, 2011 at 20:21 UTC

    Ahh, the set of available tools is so constantly increasing, and yet we tend to continue to use the ones we saw ... heh ... “ten years ago.”   Thanks for the heads-up about WWW::Scripter.   Looking forward to more of your experiences with it.