in reply to Conserving bandwidth with WWW::Mechanize's get()

I'm pretty sure by default WWW::Mechanize only downloads the web page and doesn't download referenced images or style sheets. Is your application a spider, and you are only interested in plain-text looking documents?

Replies are listed 'Best First'.
Re^2: Conserving bandwidth with WWW::Mechanize's get()
by Scythe (Initiate) on Jun 05, 2008 at 01:34 UTC
    It's not exactly a spider, in that is only grabs one piece of info from a single page.

    I had assumed that it downloaded the complete package by monitoring the volume use in an hour and dividing by (60*6), giving me a rough stab at the volume per page. I then saved the page to disc from firefox and the values were roughly the same. 60kB or so of .html file, and 90kB of images and other frippery.

    Is this assumption misguided somehow?

      WWW::Mechanize will automatically handle redirects, but those should be short messages.

      Using LWP::Debug you can get a trace of all the traffic that Mechanize is generating.

      use WWW::Mechanize; use LWP::Debug; my $mech = WWW::Mechanize->new(); LWP::Debug::level("+"); $mech->get("http://www.cnn.com/"); print length($mech->content), "\n";