js1 has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I want to write a perl script to measure the size of a web page. Apparently the page is using too much bandwidth (34kb) although by my calculations it's only 25kb.

I had a look at LWP but I'm not sure if that's the right module for this job because I need to take into account the headers and gif files. Does anyone know which module to I should use?

Thanks for any help.

js1.

Replies are listed 'Best First'.
Re: Size of a webpage
by davido (Cardinal) on Jun 16, 2004 at 08:09 UTC
    You could use the HTTP::Size module. Take a look at its POD for the get_sizes() method. The POD states that method "fetches all of the images then sums the sizes of the original page and image sizes. It returns a total download size."

    Here's an untested example:

    use HTTP::Size; my $total = HTTP::Size::get_sizes( 'http://www.perlmonks.org' ); print "$total\n";

    Hope this helps...

    UPDATE: Ok, now I've installed the module and tested the snippet above. ...it works, and at least on one of the test-runs the total size of the Perlmonks front page was 87652 bytes. That will change depending on the amount of chatterbox text, the number of and size of Front Paged articles, etc.


    Dave

Re: Size of a webpage
by tachyon (Chancellor) on Jun 16, 2004 at 08:21 UTC

    Have a look at Apache::Dynagzip You can use it on static as well as dynamic content. By compressing content before you send it a 50KB page becomes a 5-10KB bandwidth tranmission. Almost all modern browsers will accept compressed content so it is win-win. Google, Slashdot and almost all the biggies use compression. You should also look towards removing all the extraneous whitespace from the docs you serve. View source on Google for example. No spare spaces get sent. Compress::LeadingBlankSpaces (works with Dynagzip) will do this for you.

    cheers

    tachyon

      Be aware that MSIE (5.5 SP1/2 and 6 without SP) corrupts cached compressed pages. A fresh install of XP (IE6), for example, shows corrupted pages the second time that you access to some site that gzip pages and allow caching of these (sending Last-Modified headers, etc.).
      See some of the MS articles about this: Q313712 and Q312496.
      Sadly, thanks to this widespread bug, we should choice between compress pages, or allow caching.
      Slashdot and Google doesn't allow caching, I think.
      José

        Hey this is Perl..... If you read the Docs you will see:

        It is strongly recommended to use Apache::CompressClientFixup handler in order to avoid compression for known buggy browsers. Apache::CompressClientFixup package can be found on CPAN

        This works with any of the gzip compression modules and does not serve gzip compressed content to buggy browsers. The articles you quote are quoted in its docs. So you can have you cake and eat it.

        cheers

        tachyon

Re: Size of a webpage
by pelagic (Priest) on Jun 16, 2004 at 08:59 UTC
Re: Size of a webpage
by js1 (Monk) on Jun 16, 2004 at 08:43 UTC

    That module doesn't take into account the css style sheet and page referral unfortunately.

    js1.

      Stop complaining! Why not just patch it. It is trivial to patch. See Re: Who wants to help me adjust LinkExtor::Simple? for details of how to make LinkExtor::Simple extract any links you like. Then call the style, extjs methods you just created in the same way $extor->img gets looped over inf HTML::Size

      Although you can easily do it it is relevant to remember that CSS, JS, icons, button images, will generally get cached locally so while they might be part of a page they are typically reusable elements.

      The ultimate way to do it is to use a logging proxy of some sort. HTTP::Proxy with HTTP::Recorder might be an option.

      cheers

      tachyon