in reply to (jeffa) Re: size of page
in thread size of page
update: made a tiny think-o, fixing now fixed ;)
The strategy, if our link-type-tag is NOT one of TAGS_IN_NEED ( tags like <a href, which we don't count), and we have a valid link-type-url attribute , then we do a head() and add the size.use strict; use LWP::Simple; use HTML::LinkExtractor; my $url = shift || 'http://www.google.com'; my $html = get($url); my $Total = length $html; print "initial size $Total\n"; my $LX = new HTML::LinkExtractor( sub { my( $X, $tag ) = @_; unless( grep {$_ eq $tag->{tag} } @HTML::LinkExtractor::TAGS_I +N_NEED ) { print "$$tag{tag}\n"; for my $urlAttr ( @{$HTML::LinkExtractor::TAGS{$$tag{tag}} +} ) { if( exists $$tag{$urlAttr} ) { my $size = (head( $$tag{$urlAttr} ))[1]; $Total += $size if $size; print "adding $size\n" if $size; } } } }, $url, 0 ); $LX->parse(\$html); print "The total size of \n$url\n is $Total bytes\n"; __END__ use Data::Dumper; use HTML::LinkExtractor; print Dumper \@HTML::LinkExtractor::VALID_URL_ATTRIBUTES; print Dumper \%HTML::LinkExtractor::TAGS; print Dumper \@HTML::LinkExtractor::TAGS_IN_NEED;
Here is an example run through:
E:\dev\LOOSE>perl sizeapage.pl http://crazyinsomniac.perlmonk.org initial size 2018 img adding 24696 img adding 43 Total size of http://crazyinsomniac.perlmonk.org is 26757 bytes E:\dev\LOOSE>CAVEAT:
When it's available from cpan, download the latest version of HTML::LinkExtractor and this caveat goes away :)
Please be aware that if a page has some java/javascript/flash or other dynamic technologies which in turn download stuff not referenced directly on the page, there is no way for to figure that out without parsing java/javascript/flash... which isn't very practical.
update: Wed Oct 16 09:05:47 2002 GMT ~ after revisiting this old node of mine, I realized that my little snippet doesn't follow frame/layer tags , so I'm rewriting this, and i'll post it in the Code Catacombs eventually. It may be a re-inventing a wheel, but it'll be more to my liking ;)(and more complete)
____________________________________________________
** The Third rule of perl club is a statement of fact: pod is sexy.
|
|---|