WebTimeLoad 0.15 scan and time website using LWP::Ua

while learning few of the many things about LWP::* modules I wrote down this program to time and test websites.
The main goal is to report the download speed of one (or more) webpage and eventually some other information. This is accomplished by running the program in the simplest way:

 
webtimeload.pl -u www.perlmonks.org
[download]

One of other goal was to distinguish the content served by same website from content coming from outside the main site.
The program can load not only http site but every protocol supported by LWP itself (with the -p --protocol switch). You can repeat the download using the -c --count switch sleeping as you wish between requests with the -s --sleep switch. You can control the LWP::Ua timeout with the -t --timeout switch and print the headers of the main response with the -h --header switch

Most of the flexibility is obtained via the -v --verbosity switch: with a value of zero, the verbosity is quit null giving a one line output usefull for post processing. The next step is -v 1 which is hte default give up a short summary of the resposnse and a total of the download time. Verbosity 2 add more info about the server response and subtotals for included content.Verbosity 3 add detailed information (url, bytes, seconds) for each item downloaded.

TODO(please help me to complete)

Take track of redirections
be exhaustive parsing the html (is body img src frame sufficient?)i'm not sure to grab background

UPDATE: new version available below!
UPDATE feb 2013: new version available below!

#!/usr/bin/perl
use strict;
use warnings;
use LWP::UserAgent;
use Time::HiRes qw(gettimeofday tv_interval);
use Getopt::Long;
use HTML::Parse;

$|++;

# options
my (@url, $count, $sleep, $timeout,$showhead,$protocol,$verbosity);
#other  global
my ($ssep, $bsep,%stat);

$bsep = ('=' x 68)."\n";
$ssep = ('-' x 68)."\n";
&helpme()if( (! GetOptions(
                          'count=s' => \$count,
                          'url=s' => \@url,
                          'sleep=n' => \$sleep,
                          'timeout=n' => \$timeout,
                          'header' =>  \$showhead,
                          'protocol=s' => \$protocol,
                          'verbosity=n' => \$verbosity,
                       )) or (!$url[0]));
# some default values
$count ||= 1;
$sleep ||= 1;
$timeout||=15;
$verbosity = defined $verbosity ? $verbosity : 1 ;
$protocol||= 'http';
$protocol =~ s![:/]+!!g;
######################################################################
+##########
# LWP::UA initialize
my $ua = LWP::UserAgent->new;
$ua->agent("libwww-perl/5.10.1");
$ua->timeout($timeout);
######################################################################
+##########
foreach my $repetition(1..$count) {
  foreach my $u (@url) {
            unless ( $u =~ m/^$protocol/ ){$u=$protocol.'://'.$u};
            $u =~s/\/$//;       #removing an eventual / as last char
            $u=lc($u);

            undef %stat;
            %stat=(
                  sbody     =>0,            # size of the body skeleto
+n of each frame
                  sintlink  =>0,            # size internal linked con
+tent
                  sextlink  =>0,            # size external linked con
+tent
                  tbody     =>0,            # time taken by body skele
+ton of each frame
                  tintlink  =>0,            # time taken by internal c
+ontent
                  textlink  =>0,            # time taken by external c
+ontent
                  cintlink  =>0,            # count of internal link
                  cextlink  =>0,            # count of external link
                  brokenlink=>[],           # broken links
                  mainurl   => $u,          # url got as arg
                  pages     =>[$u],         # main page or all the fra
+mes
                  cache     =>{},           # cache for included resou
+rces
                  respcode  =>undef,        #
                  respmess  =>undef,        #   some responses
                  resptitle =>undef,        #
                  respserver=>undef,        #
                  resplength=>undef,        #
                  resphead  =>undef,        #

            );
            foreach my $page ( @{$stat{'pages'}})
                  { &get_page($page)   }
            foreach my $content_link (keys %{$stat{'cache'}})
                  { &time_link($content_link) }
           &report;
  }
  sleep ($sleep) unless $repetition == $count;
}
######################################################################
+##########
sub get_page  {
    my $urltoget = shift;
    my  $t0 = [gettimeofday];
    my $resp = $ua->get($urltoget);
    $stat{tbody} +=  tv_interval ($t0, [gettimeofday]);
    # add the content_length declared for EACH FRame/page loaded
    $stat{resplength} += ($resp->content_length||0);
    #  add the real bytes length obtained for EACH FRame/page loaded
    $stat{sbody} += (length ($resp->content)|| 0);
    # add some more info only for the principal page (not for any fram
+e loaded)
    $stat{respcode} = $stat{respcode} || $resp->code;
    $stat{respmess} = $stat{respmess} || $resp->message;
    $stat{resptitle} = $stat{resptitle} || $resp->title;
    $stat{respserver} = $stat{respserver} || $resp->server;
    $stat{resphead} = $stat{resphead} || $resp->headers_as_string;

    # now parse the HTLM
    my $parsed_html = parse_html($resp->content);
    foreach my $link_found(@{ $parsed_html->extract_links(qw(body img 
+src frame )) }) {
                  next if $$link_found[0] =~ /#/; # brutally skip anch
+ors
                  my $uriobj = URI->new( $$link_found[0]);
                      my $absurl   = $uriobj->abs($urltoget);

             #if is a frame add to pages adding an iteration to this s
+ub
               if ($$link_found[3] eq 'frame') {push @{$stat{'pages'}}
+, "$absurl";next}   #? need to stringify $absurl
             #else is a content and we add this to the cache hash
               else {$stat{cache}{ $absurl }=[] }  # will store there 
+length and time later on
   }
}
######################################################################
+##########
sub time_link  {
    my $url = shift;

    my  $t0 = [gettimeofday];
    my $resp = $ua->get($url);
    my $ttaken = tv_interval ($t0, [gettimeofday]);
    my $bytesrec = length($resp->content);

    if ($resp->is_success())  {
        @{$stat{'cache'}{$url}} = ($bytesrec, $ttaken );
        #official regex from http://search.cpan.org/~gaas/URI-1.58/URI
+.pm
        my($scheme, $domain, $path, $query, $fragment) =
            $url =~ m|(?:([^:/?#]+):)?(?://([^/?#]*))?([^?#]*)(?:\?([^
+#]*))?(?:#(.*))?|;
             #tell internal from external content
             if ($stat{mainurl} =~ /$domain/) {
                      $stat{sintlink} +=  $bytesrec ;
                      $stat{tintlink} +=  $ttaken ;
                      $stat{cintlink}++;
              }
              else {
                      $stat{sextlink} +=  $bytesrec ;
                      $stat{textlink} +=  $ttaken ;
                      $stat{cextlink}++;
              }
    }
    else {
        push @{$stat{'brokenlink'}},$url;
    }
}

######################################################################
+##########
sub  Arrotonda_Mega
 {
   my( $size, $n ) =( shift, 0 );
   return "0 bytes" unless defined $size;
   return "0 bytes" unless  $size > 0;
   ++$n and $size /= 1024 until $size < 1024;
    return sprintf "%.4f %s",
          $size, ( qw[ bytes Kb Mb Gb ] )[ $n ];
 }
######################################################################
+##########
sub report  {
         my $totsize = $stat{'sbody'} + $stat{'sintlink'} + $stat{'sex
+tlink'};
         my $tottime = $stat{'tbody'} + $stat{'tintlink'} + $stat{'tex
+tlink'};
         ####################
         if ($verbosity == 0){
            print scalar localtime (time),"\t",
                  $stat{mainurl},"\t",$stat{respcode},"\t",
                  $totsize,"\t",
                  $tottime,"\t",
                  (&Arrotonda_Mega($totsize / $tottime) ),"/sec","\n";
         }
         ####################
         elsif ($verbosity == 1){
            print $bsep,$stat{mainurl},"\t",scalar localtime (time),"\
+n",$ssep,
                  $stat{respcode},"\t",$stat{respmess}||'UNDEF',"\t",$
+stat{resptitle}||'UNDEF',"\n",
                  $ssep,
                  "downloaded ",&Arrotonda_Mega($totsize)," ($totsize 
+bytes) in ",
                  $tottime," seconds (",
                  &Arrotonda_Mega($totsize / $tottime),"/s)\n",$bsep;

         }
         ####################
         elsif ($verbosity > 1){
            print $bsep,$stat{mainurl},"\t",scalar localtime (time),"\
+n",$ssep,
                  "Response code:            ",$stat{respcode}||'UNDEF
+',"\n",
                  "Response message:         ",$stat{respmess}||'UNDEF
+',"\n",
                  "Response server:          ",$stat{respserver}||'UND
+EF',"\n",
                  "Response declared length: ",$stat{resplength}||'UND
+EF',"\n",
                  "Response title:           ",$stat{resptitle}||'UNDE
+F',"\n",
                  $ssep,
                      "main page content (",scalar @{$stat{pages}},"):
+\t",&Arrotonda_Mega($stat{sbody}),
                      " in ",$stat{tbody}," seconds \@ ",
                      &Arrotonda_Mega($stat{sbody} / $stat{tbody}),"/s
+\n";
                   if ($verbosity > 2) {
                          print $ssep,"\tdetail of loaded pages (url):
+\n",$ssep,
                                "\t",join ("\n\t", @{$stat{pages}}),"\
+n",$ssep;
                   }
             ### report about extra downloaded content locale to $url
             if ($stat{cintlink} > 0) {
                 print "included content ($stat{cintlink}):\t",&Arroto
+nda_Mega($stat{sintlink}),
                      " in ",$stat{tintlink}," seconds \@ ",
                      &Arrotonda_Mega($stat{sintlink} / $stat{tintlink
+}),"/s\n";
             }
             else {print "no included content found.\n"}
             ### report about extra downloaded content external to $ur
+l
             if ($stat{cextlink} > 0) {
                 print "external content  ($stat{cextlink}):\t",&Arrot
+onda_Mega($stat{sextlink}),
                      " in ",$stat{textlink}," seconds \@ ",
                      &Arrotonda_Mega($stat{sextlink} / $stat{textlink
+}),"/s\n";
             }
             else {print "no external content found.\n"}
             ### report about broken links
             if (scalar @{$stat{brokenlink}} > 0) {
                 print    "broken links found:\t",scalar @{$stat{broke
+nlink}},"\n",$ssep;
                 if ($verbosity > 2){
                            print "\tdetail of broken kins (url:\n",$s
+sep,
                            (join ("\n", @{$stat{brokenlink}})),$ssep;
                 }
             }
             else {print "no broken links found.\n",$ssep}

             if ( ($verbosity > 2) && keys %{$stat{cache}} > 0) {
                          print "\tdetail of loaded content (url bytes
+ seconds):\n",$ssep,
                                (map { "\t$_ @{$stat{cache}{$_}}\n" } 
+ sort keys %{$stat{cache}}  ),
                                $ssep;
                   }
             ## total
             print "downloaded ",&Arrotonda_Mega($totsize)," ($totsize
+ bytes) in ",
                  $tottime," seconds (",
                  &Arrotonda_Mega($totsize / $tottime),"/s)\n",$bsep;

         }
         # verbosity set to smtng strange switch to defaults
         else{ $verbosity = 1; &report  }
         # eventually print headers
         if ($showhead) {print "headers received:\n",$ssep,$stat{resph
+ead},"\n",$bsep;}
}
######################################################################
+##########
sub helpme {
 print <<EOH;

Usage of $0
$0 -u URL [-u URL2 -u URL3 --verbosity [0-3] --header --sleep n --coun
+t  n --timeout n --protocol s]


EOH
exit 0;
}
[download]

there are no rules, there are no thumbs..

Comment on WebTimeLoad 0.15 scan and time website using LWP::Ua Select or Download Code

Replies are listed 'Best First'.
Re: WebTimeLoad 0.15 scan and time website using LWP::Ua by Discipulus (Canon) on May 10, 2012 at 13:15 UTC
here a 0.20 version with a minor but important improvement: as explained by Corion walking in the cloister, you can force the UserAgent to ask for a domainname at a specifed IP addres. I find this feature very handy so i added to my code: you can pass a new option `--forced_ip_resolution 1.2.3.4` [download] achieve this. In all verbosity level of the program this is underlined after the url requested. #!/usr/bin/perl use strict; use warnings; use LWP::UserAgent; use Time::HiRes qw(gettimeofday tv_interval); use Getopt::Long; use HTML::Parse; ## v 16 risolto bug di visualizzazione broken links ## problema con URL tipo http://213.203.143.54/ada_portale/index.php?o +ption=com_frontpage&Itemid=1&lang=italian $\|++; # options my (@url, $count, $sleep, $timeout,$showhead,$protocol,$verbosity,$for +ced_ip); #other global my ($ssep, $bsep,%stat); $bsep = ('=' x 68)."\n"; $ssep = ('-' x 68)."\n"; &helpme()if( (! GetOptions( 'count=s' => \$count, 'url=s' => \@url, 'sleep=n' => \$sleep, 'timeout=n' => \$timeout, 'header' => \$showhead, 'protocol=s' => \$protocol, 'verbosity=n' => \$verbosity, 'force_ip_resolution=s' => \$forced_ip, )) or (!$url[0])); # some default values $count \|\|= 1; $sleep \|\|= 1; $timeout\|\|=15; $verbosity = defined $verbosity ? $verbosity : 1 ; $protocol\|\|= 'http'; $protocol =~ s![:/]+!!g; $forced_ip \|\|= undef; ###################################################################### +########## # LWP::UA initialize my $ua = LWP::UserAgent->new; $ua->agent("libwww-perl/5.10.1"); $ua->timeout($timeout); ###################################################################### +########## foreach my $repetition(1..$count) { foreach my $u (@url) { unless ( $u =~ m/^$protocol/ ){$u=$protocol.'://'.$u}; $u =~s/\/$//; #removing an eventual / as last char $u=lc($u); undef %stat; %stat=( sbody =>0, # size of the body skeleto +n of each frame sintlink =>0, # size internal linked con +tent sextlink =>0, # size external linked con +tent tbody =>0, # time taken by body skele +ton of each frame tintlink =>0, # time taken by internal c +ontent textlink =>0, # time taken by external c +ontent cintlink =>0, # count of internal link cextlink =>0, # count of external link brokenlink=>[], # broken links mainurl => $u, # url got as arg pages =>[$u], # main page or all the fra +mes cache =>{}, # cache for included resou +rces respcode =>undef, # respmess =>undef, # some responses resptitle =>undef, # respserver=>undef, # resplength=>undef, # resphead =>undef, # ); foreach my $page ( @{$stat{'pages'}}) { &get_page($page, $forced_ip) } foreach my $content_link (keys %{$stat{'cache'}}) { &time_link($content_link) } &report; } sleep ($sleep) unless $repetition == $count; } ###################################################################### +########## sub get_page { my $urltoget = shift; my $forced_ip = shift; my $resp; my $t0 = [gettimeofday]; if (defined $forced_ip ){ $urltoget =~ s/\w+:\/\///g; $forced_ip = $protocol.'://'.$forced_ip; $resp = $ua->get( $forced_ip , Host => $urltoget ); } else {$resp = $ua->get($urltoget);} $stat{tbody} += tv_interval ($t0, [gettimeofday]); # add the content_length declared for EACH FRame/page loaded $stat{resplength} += ($resp->content_length\|\|0); # add the real bytes length obtained for EACH FRame/page loaded $stat{sbody} += (length ($resp->content)\|\| 0); # add some more info only for the principal page (not for any fram +e loaded) $stat{respcode} = $stat{respcode} \|\| $resp->code; $stat{respmess} = $stat{respmess} \|\| $resp->message; $stat{resptitle} = $stat{resptitle} \|\| $resp->title; $stat{respserver} = $stat{respserver} \|\| $resp->server; $stat{resphead} = $stat{resphead} \|\| $resp->headers_as_string; # now parse the HTLM my $parsed_html = parse_html($resp->content); foreach my $link_found(@{ $parsed_html->extract_links(qw(body img +src frame )) }) { next if $$link_found[0] =~ /#/; # brutally skip anch +ors my $uriobj = URI->new( $$link_found[0]); my $absurl = $uriobj->abs($urltoget); #if is a frame add to pages adding an iteration to this s +ub if ($$link_found[3] eq 'frame') {push @{$stat{'pages'}} +, "$absurl";next} #? need to stringify $absurl #else is a content and we add this to the cache hash else {$stat{cache}{ $absurl }=[] } # will store there +length and time later on } } ###################################################################### +########## sub time_link { my $url = shift; my $t0 = [gettimeofday]; my $resp = $ua->get($url); my $ttaken = tv_interval ($t0, [gettimeofday]); my $bytesrec = length($resp->content); if ($resp->is_success()) { @{$stat{'cache'}{$url}} = ($bytesrec, $ttaken ); #official regex from http://search.cpan.org/~gaas/URI-1.58/URI +.pm my($scheme, $domain, $path, $query, $fragment) = $url =~ m\|(?:([^:/?#]+):)?(?://([^/?#]))?([^?#])(?:\?([^ +#]))?(?:#(.))?\|; #tell internal from external content if ($stat{mainurl} =~ /$domain/) { $stat{sintlink} += $bytesrec ; $stat{tintlink} += $ttaken ; $stat{cintlink}++; } else { $stat{sextlink} += $bytesrec ; $stat{textlink} += $ttaken ; $stat{cextlink}++; } } else { push @{$stat{'brokenlink'}},$url; } } ###################################################################### +########## sub Arrotonda_Mega { my( $size, $n ) =( shift, 0 ); return "0 bytes" unless defined $size; return "0 bytes" unless $size > 0; ++$n and $size /= 1024 until $size < 1024; return sprintf "%.4f %s", $size, ( qw[ bytes Kb Mb Gb ] )[ $n ]; } ###################################################################### +########## sub report { my $totsize = $stat{'sbody'} + $stat{'sintlink'} + $stat{'sex +tlink'}; my $tottime = $stat{'tbody'} + $stat{'tintlink'} + $stat{'tex +tlink'}; my $forcedtxt = defined $forced_ip ? "FORCED ON $forced_ip\t" + : '' ; #################### if ($verbosity == 0){ print scalar localtime (time),"\t", $stat{mainurl},"\t",$forcedtxt,$stat{respcode},"\t", $totsize,"\t", $tottime,"\t", (&Arrotonda_Mega($totsize / $tottime) ),"/sec","\n"; + #Illegal division by zero at C:\SCRIPTS\webTimeLoad15.pl line 153. } #################### elsif ($verbosity == 1){ print $bsep,$stat{mainurl},"\t",$forcedtxt,scalar localtim +e (time),"\n",$ssep, $stat{respcode},"\t",$stat{respmess}\|\|'UNDEF',"\t",$ +stat{resptitle}\|\|'UNDEF',"\n", $ssep, "downloaded ",&Arrotonda_Mega($totsize)," ($totsize +bytes) in ", $tottime," seconds (", &Arrotonda_Mega($totsize / $tottime),"/s)\n",$bsep; } #################### elsif ($verbosity > 1){ print $bsep,$stat{mainurl},"\t",$forcedtxt,scalar localtim +e (time),"\n",$ssep, "Response code: ",$stat{respcode}\|\|'UNDEF +',"\n", "Response message: ",$stat{respmess}\|\|'UNDEF +',"\n", "Response server: ",$stat{respserver}\|\|'UND +EF',"\n", "Response declared length: ",$stat{resplength}\|\|'UND +EF',"\n", "Response title: ",$stat{resptitle}\|\|'UNDE +F',"\n", $ssep, "main page content (",scalar @{$stat{pages}},"): +\t",&Arrotonda_Mega($stat{sbody}), " in ",$stat{tbody}," seconds \@ ", &Arrotonda_Mega($stat{sbody} / $stat{tbody}),"/s +\n"; if ($verbosity > 2) { print $ssep,"\tdetail of loaded pages (url): +\n",$ssep, "\t",join ("\n\t", @{$stat{pages}}),"\ +n",$ssep; } ### report about extra downloaded content locale to $url if ($stat{cintlink} > 0) { print "included content ($stat{cintlink}):\t",&Arroto +nda_Mega($stat{sintlink}), " in ",$stat{tintlink}," seconds \@ ", &Arrotonda_Mega($stat{sintlink} / $stat{tintlink +}),"/s\n"; } else {print "no included content found.\n"} ### report about extra downloaded content external to $ur +l if ($stat{cextlink} > 0) { print "external content ($stat{cextlink}):\t",&Arrot +onda_Mega($stat{sextlink}), " in ",$stat{textlink}," seconds \@ ", &Arrotonda_Mega($stat{sextlink} / $stat{textlink +}),"/s\n"; } else {print "no external content found.\n"} ### report about broken links if (scalar @{$stat{brokenlink}} > 0) { print "broken links found:\t",scalar @{$stat{broke +nlink}},"\n",$ssep; if ($verbosity > 2){ print "\tdetail of broken links (url):\n", +$ssep, #(join ("\n", @{$stat{brokenlink}})),$ssep +; (map {"\t".$_."\n"} @{$stat{brokenlink}}), +$ssep; } } else {print "no broken links found.\n",$ssep} if ( ($verbosity > 2) && keys %{$stat{cache}} > 0) { print "\tdetail of loaded content (url bytes + seconds):\n",$ssep, (map { "\t$_ @{$stat{cache}{$_}}\n" } + sort keys %{$stat{cache}} ), $ssep; } ## total print "downloaded ",&Arrotonda_Mega($totsize)," ($totsize + bytes) in ", $tottime," seconds (", &Arrotonda_Mega($totsize / $tottime),"/s)\n",$bsep; } # verbosity set to smtng strange switch to defaults else{ $verbosity = 1; &report } # eventually print headers if ($showhead) {print "headers received:\n",$ssep,$stat{resph +ead},"\n",$bsep;} } ###################################################################### +########## sub helpme { print <<EOH; Utilizzo di $0 $0 -u URL [-u URL2 -u URL3 --verbosity [0-3] --header --sleep n --coun +t n --timeout n --protocol s --force_ip_resolution s] EOH exit 0; } [download] there are no rules, there are no thumbs..	[reply] [d/l] [select]
WebTimeLoad 0.23 by Discipulus (Canon) on Feb 22, 2013 at 10:02 UTC
A new version! 0.23 New parameter -r --render save the content on a temporary file and open it at the end of the program. It creates only a similar page not identical, just to give you an idea of what you are fetching. created a decent help. intercepted some possible "Illigal division by zero.." errors. enjoy: UPDATE january 31 2014 removed `$u=lc($u)` beacuse is bugged: you CAN have uppercaseness in urls. #!/usr/bin/perl use strict; use warnings; use LWP::UserAgent; use Time::HiRes qw(gettimeofday tv_interval); use Getopt::Long; use HTML::Parse; ## v 16 risolto bug di visualizzazione broken links ## problema con URL tipo http://213.203.143.54/ada_portale/index.php?o +ption=com_frontpage&Itemid=1&lang=italian # v 20 added force_ip_resolution # v 22 rendering OK (in timelink e get_page) e formattazione %.6d per +$tottime in report # v 23 help added $\|++; # options my (@url, $count, $sleep, $timeout,$showhead,$protocol,$verbosity,$for +ced_ip,$render); #other global my ($ssep, $bsep,%stat); $bsep = ('=' x 68)."\n"; $ssep = ('-' x 68)."\n"; &helpme()if( (! GetOptions( 'count=n' => \$count, 'url=s' => \@url, 'sleep=n' => \$sleep, 'timeout=n' => \$timeout, 'header' => \$showhead, 'protocol=s' => \$protocol, 'verbosity=n' => \$verbosity, 'force_ip_resolution=s' => \$forced_ip, 'render' => \$render, )) or (!$url[0])); # some default values $count \|\|= 1; $sleep \|\|= 1; $timeout\|\|=15; $verbosity = defined $verbosity ? $verbosity : 1 ; $protocol\|\|= 'http'; $protocol =~ s![:/]+!!g; $forced_ip \|\|= undef; $render \|\|=undef; if ($render) { $count = 1; @url = qq($url[0])} ###################################################################### +########## # LWP::UA initialize my $ua = LWP::UserAgent->new; $ua->agent("libwww-perl/5.10.1"); $ua->timeout($timeout); ###################################################################### +########## foreach my $repetition(1..$count) { foreach my $u (@url) { unless ( $u =~ m/^$protocol/ ){$u=$protocol.'://'.$u}; $u =~s/\/$//; #removing an eventual / as last char #$u=lc($u);####THIS IS BUGGED, dunno why i put it... undef %stat; %stat=( sbody =>0, # size of the body skeleto +n of each frame sintlink =>0, # size internal linked con +tent sextlink =>0, # size external linked con +tent tbody =>0, # time taken by body skele +ton of each frame tintlink =>0, # time taken by internal c +ontent textlink =>0, # time taken by external c +ontent cintlink =>0, # count of internal link cextlink =>0, # count of external link brokenlink=>[], # broken links mainurl => $u, # url got as arg pages =>[$u], # main page or all the fra +mes cache =>{}, # cache for included resou +rces respcode =>undef, # respmess =>undef, # some responses resptitle =>undef, # respserver=>undef, # resplength=>undef, # resphead =>undef, # ); foreach my $page ( @{$stat{'pages'}}) { &get_page($page, $forced_ip) } foreach my $content_link (keys %{$stat{'cache'}}) { &time_link($content_link) } &report; } sleep ($sleep) unless $repetition == $count; } ###################################################################### +########## sub get_page { my $urltoget = shift; my $forced_ip = shift; my $resp; my $t0 = [gettimeofday]; if (defined $forced_ip ){ $urltoget =~ s/\w+:\/\///g; $forced_ip = $protocol.'://'.$forced_ip; $resp = $ua->get( $forced_ip , Host => $urltoget ); } else {$resp = $ua->get($urltoget);} $stat{tbody} += tv_interval ($t0, [gettimeofday]); # add the content_length declared for EACH FRame/page loaded $stat{resplength} += ($resp->content_length\|\|0); # add the real bytes length obtained for EACH FRame/page loaded $stat{sbody} += (length ($resp->content)\|\| 0); # add some more info only for the principal page (not for any fram +e loaded) $stat{respcode} = $stat{respcode} \|\| $resp->code; $stat{respmess} = $stat{respmess} \|\| $resp->message; $stat{resptitle} = $stat{resptitle} \|\| $resp->title; $stat{respserver} = $stat{respserver} \|\| $resp->server; $stat{resphead} = $stat{resphead} \|\| $resp->headers_as_string; # now parse the HTLM my $parsed_html = parse_html($resp->content); foreach my $link_found(@{ $parsed_html->extract_links(qw(body img +src frame)) }) { next if $$link_found[0] =~ /#/; # brutally skip anch +ors my $uriobj = URI->new( $$link_found[0]); my $absurl = $uriobj->abs($urltoget); #if is a frame add to pages adding an iteration to this s +ub if ($$link_found[3] eq 'frame') {push @{$stat{'pages'}} +, "$absurl";next} #? need to stringify $absurl #else is a content and we add this to the cache hash else {$stat{cache}{ $absurl }=[] } # will store there +length and time later on } if ($render){ mkdir "$ENV{TEMP}\\_temp_files"\|\|die; open RENDER, "> $ENV{TEMP}/_temp.html"\|\| die "unable to write +to %TEMP%\\_temp.html"; (my $localcont = $resp->content ) =~s/src="([^"])\//src=".\/_te +mp_files\//gm; print RENDER $localcont; close RENDER; #system ("start $ENV{TEMP}/_temp.html"); } } ###################################################################### +########## sub time_link { my $url = shift; my $t0 = [gettimeofday]; my $resp = $ua->get($url); my $ttaken = tv_interval ($t0, [gettimeofday]); #$ttaken = sprintf ("%0.6f", $ttaken); my $bytesrec = length($resp->content); if ($resp->is_success()) { @{$stat{'cache'}{$url}} = ($bytesrec, $ttaken ); #official regex from http://search.cpan.org/~gaas/URI-1.58/URI +.pm my($scheme, $domain, $path, $query, $fragment) = $url =~ m\|(?:([^:/?#]+):)?(?://([^/?#]))?([^?#])(?:\?([^ +#]))?(?:#(.))?\|; #tell internal from external content if ($stat{mainurl} =~ /$domain/) { $stat{sintlink} += $bytesrec ; $stat{tintlink} += $ttaken ; $stat{cintlink}++; } else { $stat{sextlink} += $bytesrec ; $stat{textlink} += $ttaken ; $stat{cextlink}++; } if ($render){ (my $ele = $url )=~s/^.\///; #would be better convert html escapes like in WITH%2 +0SPACES.gif open RENDER, "> $ENV{TEMP}\\_temp_files\\$ele"\|\| d +ie "unable to write to %TEMP%\\_temp_files\\$ele"; binmode RENDER; print RENDER $resp->content; close RENDER; } } else { push @{$stat{'brokenlink'}},$url; } } ###################################################################### +########## sub Arrotonda_Mega { my( $size, $n ) =( shift, 0 ); return "0 bytes" unless defined $size; return "0 bytes" unless $size > 0; ++$n and $size /= 1024 until $size < 1024; return sprintf "%.4f %s", $size, ( qw[ bytes Kb Mb Gb ] )[ $n ]; } ###################################################################### +########## sub report { my $totsize = $stat{'sbody'} + $stat{'sintlink'} + $stat{'sex +tlink'}; my $tottime = $stat{'tbody'} + $stat{'tintlink'} + $stat{'tex +tlink'}; $tottime = sprintf ("%0.6f", $tottime); my $forcedtxt = defined $forced_ip ? " \@ $forced_ip\t" : + "\t" ; #################### if ($verbosity == 0){ print scalar localtime (time),"\t", $stat{mainurl},$forcedtxt,$stat{respcode},"\t", $totsize,"\t", $tottime,"\t", ( $tottime ? &Arrotonda_Mega($totsize / $tottime)."/ +s" : 'NA' ), ,"\n"; } #################### elsif ($verbosity == 1){ print $bsep,$stat{mainurl},$forcedtxt,scalar localtime (ti +me),"\n",$ssep, $stat{respcode},"\t",$stat{respmess}\|\|'UNDEF',"\t",$ +stat{resptitle}\|\|'UNDEF',"\n", $ssep, "downloaded ",&Arrotonda_Mega($totsize)," ($totsize +bytes) in ", $tottime," seconds (", ($tottime ? &Arrotonda_Mega($totsize / $tottime)."/s +)" : 'NA'), ,"\n",$bsep; } #################### elsif ($verbosity > 1){ print $bsep,$stat{mainurl},$forcedtxt,scalar localtime (ti +me),"\n",$ssep, "Response code: ",$stat{respcode}\|\|'UNDEF +',"\n", "Response message: ",$stat{respmess}\|\|'UNDEF +',"\n", "Response server: ",$stat{respserver}\|\|'UND +EF',"\n", "Response declared length: ",$stat{resplength}\|\|'UND +EF',"\n", "Response title: ",$stat{resptitle}\|\|'UNDE +F',"\n", $ssep, "main page content (",scalar @{$stat{pages}},"): +\t",&Arrotonda_Mega($stat{sbody}), " in ",$stat{tbody}," seconds \@ ", ($stat{tbody} ? &Arrotonda_Mega($stat{sbody} / $ +stat{tbody})."/s)" : 'NA'), "\n"; if ($verbosity > 2) { print $ssep,"\tdetail of loaded pages (url): +\n",$ssep, "\t",join ("\n\t", @{$stat{pages}}),"\ +n",$ssep; } ### report about extra downloaded content locale to $url if ($stat{cintlink} > 0) { print "included content ($stat{cintlink}):\t",&Arroto +nda_Mega($stat{sintlink}), " in ",$stat{tintlink}," seconds \@ ", #&Arrotonda_Mega($stat{sintlink} / $stat{tintlin +k}),"/s\n"; ($stat{tintlink} ? &Arrotonda_Mega($stat{sintlin +k} / $stat{tintlink})."/s)" : 'NA'), "\n"; } else {print "no included content found.\n"} ### report about extra downloaded content external to $ur +l if ($stat{cextlink} > 0) { print "external content ($stat{cextlink}):\t",&Arrot +onda_Mega($stat{sextlink}), " in ",$stat{textlink}," seconds \@ ", #&Arrotonda_Mega($stat{sextlink} / $stat{textlin +k}),"/s\n"; ($stat{textlink} ? &Arrotonda_Mega($stat{sextlin +k} / $stat{textlink})."/s)" : 'NA'), "\n"; } else {print "no external content found.\n"} ### report about broken links if (scalar @{$stat{brokenlink}} > 0) { print "broken links found:\t",scalar @{$stat{broke +nlink}},"\n",$ssep; if ($verbosity > 2){ print "\tdetail of broken links (url):\n", +$ssep, #(join ("\n", @{$stat{brokenlink}})),$ssep +; (map {"\t".$_."\n"} @{$stat{brokenlink}}), +$ssep; } } else {print "no broken links found.\n",$ssep} if ( ($verbosity > 2) && keys %{$stat{cache}} > 0) { print "\tdetail of loaded content (url bytes + seconds):\n",$ssep, (map { "\t$_ @{$stat{cache}{$_}}\n" } + sort keys %{$stat{cache}} ), $ssep; } ## total print "downloaded ",&Arrotonda_Mega($totsize)," ($totsize + bytes) in ", $tottime," seconds (", ( $tottime ? &Arrotonda_Mega($totsize / $tottime)."/ +s)" : 'NA' ), "\n",$bsep; } # verbosity set to smtng strange switch to defaults else{ $verbosity = 1; &report } # eventually print headers if ($showhead) {print "headers received:\n",$ssep,$stat{resph +ead},"\n",$bsep;} if ($render) {system ("start $ENV{TEMP}/_temp.html");} } ###################################################################### +########## sub helpme { print <<EOH; USAGE of $0: --url URL [-u URL2 -u URL3 -v [0-3] -h -s n -c n -t n -p s -f IP -r] -c --count how many times repeat the check. default is 1 -f --force_ip_resolution given a valid dotted decimal IP force on +it the resolution of the url. -h --header set to 1 to print header too. deefault to 0 -p --protocol http is assumed unless specified. -r --render rend the page as it was parsed (giving just an idea +of the original) -s --sleep seconds between checks. default is 1. -t --timeout the timeout for the connection. default to 15 second +s -u --url the url or urls we want to scan -v --verbosity from 0 (one line output for each check) to 3. defau +lt 1 NOTES: --render set also --count to 0 and consider the first url only. I +t does not replicate the original page because does not take in count of css +and other thinks. --verbosity 0 the one line output containing this info: date, url, res +ponse code, byte downloaded, seconds elapsed, speed. 1 a more descrictive multiline output with title of the pa +ge and bytes expressed in human readable form too. 2 as 1 but with general infos about external or internal l +oaded content and broken links. 3 as 2 with detailed info for each loaded content. --protocol http is the default, you can specify ftp too. https is +supported only if you have installed Crypt::SSLeay module. EXAMPLES: -u www.perl.org -c 10 -s 5 check it 10 times sleeping 5 seconds between checks -u www.perl.org -u search.cpan.org -c 10 -s 5 as above but checks two sites -u www.perl.org -c 5000 -s 60 -v 0 a long running check with one line output -u www.perl.org -h -v 3 a detailed check with header dumped -u www.perl.org -f 127.0.0.1 try my local copy of this site.. -u 127.0.0.1 -p ftp access the ftp running locally. BUGS: If your connection is too fast you can experience an 'Illigal divi +sion by zero..' error in speed calculation..;) The process of downloading content and following links will be any +way different from a real browser. You cannot trust the size at byte resolution: ther +e is no cache effect between checks, no css download nor favicons. You'll get results v +ery near the real datum and speed quite realworld sized. The rendering of the page with -r option is fairly inaccurate but +serve to give an overall idea of the page. AUTHOR: dis cip ulus a t pmonks org DISCLAIMER: use at your own risk. EOH exit 0; } [download] there are no rules, there are no thumbs..	[reply] [d/l] [select]