After the talk of performance increase/decrease after the OS changes behind PerlMonks.org, I decided to create a tool that could be used to provide some actual statistics.

Basically it use LWP::Simple to fetch a page, Time::HiRes to measure the time it took, and mySQL/DBI to catalog the results. It can be used for other sites I guess, but their is PM specific code (extracting users from $content).

Comments/Suggestions very welcome (especially on the regex used for extracting the number of users). A tool for actually displaying some results is forthcoming.

Here is the code I used to create the web_load table:

create table web_load ( date datetime not null, url text not null, load_secs float unsigned not null, users int unsigned not null);

and here is the perl code:

#!/usr/bin/perl -w use DBI; use LWP::Simple; use Time::HiRes qw(gettimeofday tv_interval); use Getopt::Std; ####commandline config my (%options); getopts("w:u:p:h", \%options); if ($options{h}) { print <<'eof'; -w webpage: Webpage to fetch -u username: Username for mysql -p password: Password for mysql -h: This help file eof exit; } ####config my ($db_insert_time) = qq{INSERT web_load (date, url, load_secs, u +sers) VALUES(now(),?,?,?)}; #sql insert my ($url) = $options{w} || 'http://www.perlmonks.org/index.pl?node +_id=131'; #default webpage (perlmonks frontpage) my ($db_user_name) = $options{u} || ''; #default mysql username my ($db_password) = $options{p} || ''; #defualt mysql password my ($db_database) = 'DBI:mysql:website'; #default mysql database ####connect to db my ($DBH) = DBI->connect ($db_database, $db_user_name, $db_passwor +d, { RaiseError => 1 }); ####record start time, get frontpage, and calculate elapsed time my ($start_secs); $start_secs = [gettimeofday]; my ($content) = get($url); die ("Couldn't GET $url\n") unless defined $content; my ($load_secs) = tv_interval ($start_secs); ####extract users from $content and do some error checking (only for p +erlmonks) my ($users) = $content =~ /\((\d+?)\)<br \/><a HREF=/; die ("Couldn't extract users from $url\n") unless defined $users; ####insert users and load_secs into database my ($STH) = $DBH->prepare($db_insert_time); $STH->execute($url, load_secs, $users); ####database finish $STH->finish(); $DBH->disconnect();

Replies are listed 'Best First'.
Re: mySQL Webpage Load Cataloger
by Coruscate (Sexton) on Feb 11, 2003 at 11:31 UTC

    Cool, but not necessarily accurate. The script will have completely different effects based on many factors. Just to name a few:

    • Your internet connection: If you're on a slow modem, you're only really going to measure the slowness of your modem, not the response time of the server.
    • Nodelets: My results are going to vary from yours, as will anyone who has a different nodelet setup. Some nodelets take longer to prepare than others, so someone with nodelets turned off will see a faster load than someone with every nodelet turned on. In this case, the code would be measuring the time it takes to get everything together, rather than the speed of the site.
    • As for the 'log number of users' part, this won't provide any hardcore statistics either, as the level and intensity of user activity depends on just that: what users are doing. For example, say you have 3 entries where 40 users were logged in. The first entry, it was simply 40 people browsing the site content, no chatterbox clients running. The second entry with 40 users logged hit at a time when 25 of those users weren't really 'users' on the site: just bots hitting the xml tickers and grabbing nodes. The third time an entry of 40 users is marked, 10 users were attempting to load newest nodes, 23 were hitting xml tickers, and 18 were browsing the site. Ouch. And then, say you have an entry where there were only 7 users logged in, but it took *forever* to get the server's response. Why? Because you nailed the site at just the right time: a database backup is being done, and the site responsiveness slows down mega-time. Your code has no idea that a backup is in progress, so it logs an event of '7 users logged in, 59 seconds to load the page'.
    • So, while it's a cool idea, there are just too many factors that come into play. Mainly time. At one particular second, 12 hits to the server might be made, while 1 second later, only 1 request was served. It's really just too unpredictable :)

    ++ for the effort however :)


    If the above content is missing any vital points or you feel that any of the information is misleading, incorrect or irrelevant, please feel free to downvote the post. At the same time, reply to this node or /msg me to tell me what is wrong with the post, so that I may update the node to the best of my ability. If you do not inform me as to why the post deserved a downvote, your vote does not have any significance and will be disregarded.

      Well my hope is that the you would cron this script and run it every hour for say a month before starting to do some analysis. While noting the number of users on the site doesn't neccesarily provide a complete model of site load, over a month my guess is that there is a strong correlation between the number of users recorded and the time it takes to load. When creating your graphs etc. at the end of the month you could eliminate outliers. If you notice certain times where the load time is always high (i.e. a certain time when the database is being backuped), you could eliminate those times too. Over a long enough period of time you should come out with some good data.

      As far a s personal nodelets go, since I am loading the default frontpage, which will (almost) always have the same nodelet configuration, this and other personalization issues arent really relevant. Even if I was loading my personal frontpage, as long as I didnt make significant changes to the configuration, this program should provide you with good data.

      One thing that I didnt account for is the size of the page loaded. Obviously the larger the page, the longer it will take to load regardless of users, so a way to compensate for that (and for users who can't saturate the PM bandwidth) is record the size of $url each time you calculate load time and when you are doing analysis, throw out extremely large frontpages etc.

      Here is a new mySQL table:

      create table web_load ( date datetime not null, url text not null, load_secs float unsigned not null, size int unsigned not null, users int unsigned not null);
      and new code which takes size into account:
      #!/usr/bin/perl -w use DBI; use LWP::Simple; use HTTP::Size; use Time::HiRes qw(gettimeofday tv_interval); use Getopt::Std; ####commandline config my (%options); getopts("w:u:p:h", \%options); if ($options{h}) { print <<'eof'; -w webpage: Webpage to fetch -u username: Username for mysql -p password: Password for mysql -h: This help file eof exit; } ####config my ($db_insert_time) = qq{INSERT web_load (date, url, load_secs, s +ize, users) VALUES(now(),?,?,?,?)}; #sql insert my ($url) = $options{w} || 'http://www.perlmonks.org/index.pl?node +_id=131'; #default webpage (perlmonks frontpage) my ($db_user_name) = $options{u} || ''; #default mysql username my ($db_password) = $options{p} || ''; #defualt mysql password my ($db_database) = 'DBI:mysql:website'; #default mysql database ####connect to db my ($DBH) = DBI->connect ($db_database, $db_user_name, $db_passwor +d, { RaiseError => 1 }); ####record start time, get frontpage, and calculate elapsed time my ($start_secs); $start_secs = [gettimeofday]; my ($content) = get($url); die ("Couldn't GET $url\n") unless defined $content; my ($load_secs) = tv_interval ($start_secs); ####extract users from $content and do some error checking (only for p +erlmonks) my ($users) = $content =~ /\((\d+?)\)<br \/><a HREF=/; die ("Couldn't extract users from $url\n") unless defined $users; ####calculate size of $url (if someone knows a better way to do this p +lease tell me) my ($size) = HTTP::Size::get_size( $url ); die ("Couldn't get size of $url\n") unless defined $size; ####insert users and load_secs into database my ($STH) = $DBH->prepare($db_insert_time); $STH->execute($url, load_secs, $size, $users); ####database finish $STH->finish(); $DBH->disconnect();