I have a great one!! its just a matter of getting it from my mind to a source file. :P
Seriously though, look at what you are asking for, and what you didnt provide
Written in Perl:
This is easy, only question is what platform? *nix, Mac (which I guess is a *nix now), MS?
Simple, not many features:
Ok, what features do you want? Why?
Neat code, at least use strict, preferably with warnings on:
Ok, first off, neat code. By who's standards? yours? the perl community at large? by a X platform developer? by a X language developer?
Next use strict.. Hrm people are either for or against its blind use, I guess is this case this is a relativly straight forward requirement
With warnings on: Why? If this is going to get run out of cron, and it warns on stuff, you are going to get an email every time it runs..
Just ONE script, 1 file:
Again why? How much functionality do you want? looking ahead we can find the data points you think are relavant.
Unique hits per day:
Ok at least one pass over the data, assuming noone is using NAT at any point, keeping a hash of hits, simple enough
Referrers:
Provided that data is in the log as well we now either A) need to enhance the first hash to keep ticks on who came from where when we count them or B) Keep a second data structure with ref counts..
Search engine keywords:
What search engine? Do all of their logs entries appear the same (I.e are they all in the same format?), does the web server know when its being accessed by a browser as opposed to a search engine?
a nice graph as well possibly:
So now we have to map out all the data points in some format. What format would you like that in? plain text via ascii art, GIF, JPEG, PNG, etc.. Do you want to be able to store some graphs in one format and others in different formats? How should the script do that? Should it graph usage for just this log file or should it maintain a cache for X period of time? How much room are you willing to give up for data points over X time frame to be saved? Do you want to be able to build dynamic graphs or only static graphs? What level of granularity should the graphs provide? How many graphs do you want to save? How much room are you willing to give up for those graph files?
No offense, as I realize I am coming across harshly. You should pick a system that is relativly close to what you want and start coding from there. The reqs are kinda vague, they don't seem to show an understanding of everything that could happen in a log file, and simply wave it off as technological magic. You want something simple, but your asking for a simple *complex* application tailored exactly to your needs. If there is any code out there, its going to be more general purpose. I.e able to handle say Apache logs and everything that could happen in them, so that it only needs to be written once to deal with any data mining from that type of log.
If you want to see how to graph data on the fly look into the excellent GD module family. Wonderful set of modules, but if you want graphs without drawing all the lines yourself, look at the GD::*Graph family.. Also if you want it to be 3D look at GD::3D* family of modules.
Ill stop now because I can simply feel the XP draining away, and Im sorry if I coming across harshly, I probably shouldnt be posting when in this mood.
A simple short answer is: More than likely someone somewhere has written exactly what you want, finding it may or may not be possible, but more than likely it will *not* be just off the beaten e-tracks
/* And the Creator, against his better judgement, wrote man.c */ | [reply] |
I could answer all your questions (and i could have answered them in the first post) but that'd be such a long story that nobody would read it. It's a K.I.S.S. thing.
I do know the complexities of creating a Log Analyzer, which is why i'm not doing it myself (yet).
| [reply] |
To parse the logfile, you might have a look at regexp-log, HTTPD-Log-Filter or Log-Detect. Even if you can't use these modules directly, they will certainly give you some good ideas on how to tackle your task! How nice, I have used the above text twice today!
CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law
| [reply] |
I don't understand why this "just one script".
I use the swatch as a log analyzer, but I think you mean one of another kind. You want to consolidate your http log data,
right ?
Zenn | [reply] |
Why just one script?
I just fail to believe why i need 5 .pl files, a bunch of images and a couple of html docs for some simple log analysis.
Although for example awstats is very good, it does much more than i want it to.
| [reply] |
Although for example awstats is very good, it does much more than i want it to.
Very good? Well, it works and makes pretty reports. Now have a look at its code. Not at all clean. I think the program could have been twice as fast if coded by someone who knows Perl.
Juerd
- http://juerd.nl/
- spamcollector_perlmonks@juerd.nl (do not use).
| [reply] |
These days I think if you want to use other
people's clean code,
you must accept modules. This will require more
than one file.
My log file analyzer makes very nice graphs using
R. In addition, having
the data in R makes it easy to do statistical
analysis. If R is anything, though, it is not
simple!
One of the hardest things to do well in logfile analysis
is to properly parse the default Apache log file.
It is easy for users to type in weird URLs that break
most parsers for common log formats.
It is much easier to
modify the log file format to eliminate this
possibility than it is to write a parser to fix
this problem.
If the web server stores the log data that is actually
needed in a bullet-proof way,
it is easy to meet your requirements
in just a few lines of code. Who wants to parse
dates? Just store seconds since the epoch. How
about some decent delimiters? Whitespace is
not always the best choice!
It should work perfectly the first time! - toma | [reply] |