in reply to Need a way to print script output to web page?

"I wrote a script to count the number of hits but I do not want a cheesy hit counter like alot of sites have."

Are you using Apache? Yes? Then how about "data mining" the access logfile? This one-liner:

perl -lane'$h{$F[0]}++}{print scalar keys %h' /path/to/apache/logs/acc +ess_log
will parse through the entire space delimited log file, store the first field ($F[0] ... don't worry if that is confusing right now, keep reading) of each line into a hash and finally, when the log file has been exhausted, it prints the total number of unique IP addresses read.

Now then, in case that one-liner just made your ears bleed, here is the same thing in a more drawn out manner, this time as a CGI script that can be accessed via web browser. While we are at it, let's do a little reporting as well. We will show those unique IP address and their total number of hits as well, sorted by hits descending:

use strict; use warnings; use CGI qw(:standard); use CGI::Carp qw(fatalsToBrowser); print header,start_html; open LOG, '/path/to/apache/logs/access_log' or die $!; my %hist; while (<LOG>) { my @F = split / /, $_; $hist{$F[0]}++; } print "there are ", scalar(keys %hist), " total hits"; # just run the code and worry how it works later ;) print table( Tr(th[qw(IP Hits)]), map Tr(td[ $_, $hist{$_} ]), sort {$hist{$b} <=> $hist{$a}} keys %hist, );

But ... there are problems with this script. Big problems. For one, the access log files are rotated ... that is, we are not seeing a lot of hits from the past. What if we wanted to see hits for a certain date range, what if we wanted to see which pages had the most hits. We can do this with the code above because the data is there for the picking ... but we still haven't discussed the worst problem:

IT IS SLOW! Not only that, but you are also using up a lot of RAM as well. Everytime a visitor hits the page that calls this script, they will have to wait about 5-30 seconds for the page to load.

So ... how do fix this? Easy ... you could run a cron job every hour or so that parses the access log file, and writes the total hits to a file. We would have to ensure that no visitor tried to access that file while it was being written to of course ... that can be a pain. There are all kinds of wheels that we can re-invent to report IP hits ...

But i think the best solution of all is to head ye over to The Home of The Webalizer and RTFM how to use it. That is not only a non-cheesy solution, it is a free, professional solution. Best of luck to you. :)

jeffa

L-LL-L--L-LL-L--L-LL-L--
-R--R-RR-R--R-RR-R--R-RR
B--B--B--B--B--B--B--B--
H---H---H---H---H---H---
(the triplet paradiddle with high-hat)

Replies are listed 'Best First'.
Re^2: Need a way to print script output to web page?
by Nkuvu (Priest) on Feb 13, 2004 at 07:35 UTC

    (See update below...)

    I had some semi-serious issues with Webalizer. I'm using OS X, and it came with a nifty Perl install script. But it hosed my httpd.conf (it did, at least, make a backup). Check out the offending section (note that $t is input from a diamond operator):

    if ($t =~ /^y/) { `cp /etc/httpd/httpd.conf /etc/httpd/httpd.conf.save`; $orig = 'CustomLog "\/private\/var\/log\/httpd\/access_log" common +'; $new = 'CustomLog "\/private\/var\/log\/httpd\/access_log" combine +d'; print "Modifying apache configuration..."; `/usr/bin/sed -e 's/$orig/\#$orig/' -e 's/\#$new/$new/' /etc/httpd +/httpd.conf > /etc/httpd/httpd.conf`; print " done. \n\nOriginal saved to /etc/httpd/httpd.conf.save\n\ +n"; print "Apache needs to be restarted for this to take effect, would + you like to do this now? (y/n) "; $a = <>; if ($a =~ /^y/) { print "Restarting apache..."; print " done\n\n"; } }
    Note a few things. This script needs to be run as root (I did, via sudo). Backticks in void context, then a sed call (sed? From within Perl?!). I believe the sed call is what zeroed out my httpd.conf. Then, the part that I'm just fuming about. "Would you like to restart Apache?" I say yes. Script does nothing. "OK!"

    It took me a little over a half hour to debug this. I didn't see any problems because I didn't have to reboot my machine until tonight (I originally installed Webalizer yesterday morning). And I only had to reboot tonight because I installed some other random program -- so I thought that the other program was the culprit (but it has nothing to do with Apache, which is why I was stumped).

    All in all I really like the reports from Webalizer. But be warned if you install it with the OS X script. I'll be contacting the author with this information after this...

    Update: I don't recall exactly which version of Webalizer had this malfunctioning script, but the latest seems to have addressed the problem. The script still has backticks in void context (so no error checking to see if the command failed) but at least it does actually perform the 'apachectl restart' when you tell it to. Just in case you were curious.