http://qs1969.pair.com?node_id=554930

myuserid7 has asked for the wisdom of the Perl Monks concerning the following question:

Hello everyone,

I'm using Mechanize to download some files within multiple spidered pages. Unfortunately Javascript is used to write out the links to the files on each page, so I've been injecting the links manually and then following them:

my $html =~ s#</body#<a href="..">MY LINK</a></body#; $m->update_html($html); $m->follow_link(text=>"MY LINK"); $m->save_content(..);

there must be a better way to do this.. any ideas?

thanks.

Edited by planetscape - changed pre to code tags

( keep:1 edit:15 reap:0 )

Replies are listed 'Best First'.
Re: Better solution with Mechanize?
by polettix (Vicar) on Jun 13, 2006 at 00:26 UTC
    Why don't you use the get() method that WWW.Mechanize inherits (ok, overloads) from LWP::UserAgent? It's as easy as doing:
    $m->get($your_url); $m->save_content($in_this_file) if $m->success();

    Flavio
    perl -ple'$_=reverse' <<<ti.xittelop@oivalf

    Don't fool yourself.
      I would, but I need the referrer headers to correctly reflect the origin of the clicked link. AFAIK get() is a new, unattached request..
Re: Better solution with Mechanize?
by bart (Canon) on Jun 13, 2006 at 08:41 UTC
    If you use Win32::IE::Mechanize instead of WWW::Mechanize, which of course you can only do on Windows, because it uses the core of MS Internet Explorer, then you'll find it'll interpret Javascript and embed document.write() output in the produced "source". It's a neat way to have this kind of stuff handled automatically.
      Interesting, thanks.. not an option this time, but I'll keep that in mind.