Downloading and parsing apache logs

juanmarcosmoren has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Downloading and parsing apache logs by matija (Priest) on Apr 29, 2004 at 10:34 UTC
Are the files really rotated on the one minute intervals? Or do you just mean that you want to download the big file at one minute intervals, but it gets rotated less frequently (perhaps once per day)? Assuming the later, you could remember the first line of a file, and the number of the line you last processed. When you fetch the file, you examine it. If it's the same as the line you saved, the file has not been rotated yet, so you know how many lines to skip without parsing - (that's the second value you remembered). If it's not the same, the file has been rotated and you restart the "lines already seen" counter to zero. Having said that, fetching the webserver logs every minute seems extraordinarily wastefull. The traffic from transfering the logs would soon become a significant chunk of the total site traffic. If you need the up-to-the-minute information, you are much better off talking with whoever controls the server, and arranging for a small monitoring program to read the log file as it is written (perhaps using File::Tail), process as much information as possible, and forward that information to the script on your server.	[reply]
Re: Downloading and parsing apache logs by juanmarcosmoren (Initiate) on Apr 29, 2004 at 11:57 UTC
OK, I didn't express very well myself. The files are rotated daily. So when I get the URL and in a minute I get the same URL again it has a few new log entries that are the ones I want to parse. Once a day the file is truncated so all the previous log entries are lost (but I don't need them). As I said I can't control the server, it's definitely not on my hands.	[reply]
Re: Re: Downloading and parsing apache logs by matija (Priest) on Apr 29, 2004 at 13:01 UTC
Let me put it this way: If I were the admin of that server, and you presented your case well enough, possibly with a simple program that would monitor the logfile and send only the essential data out, I would probably be persuaded to let you run it. Or at least suggest some kind of an alternative. If, on the other hand, I found that the server which has "a few new log entries" per minute is spending most of it's time sending it's own daily log to one user, I would make that user very, very sorry. As in bastard operator from hell sorry. Trust me on this: talk to the people managing the server first. If you try talking to them after they've shut you down for abusing their services, you won't get anywhere. Yes, I admin a lot of servers. No, I'm not mean. Usually.	[reply]
Re: Downloading and parsing apache logs by bronto (Priest) on Apr 29, 2004 at 08:52 UTC
If I understand what you are asking, you could run the risk of downloading and parsing the same log twice. To avoid it I suggest this: Each time you download a log, calculate its MD5 checksum and save it. when you download a new log after a minute, calculate its checksum and compare with the last one: if they are the same wait some seconds and then repeat when you find them different, discard the old checksum and save the new one, then parse the log To calculate the checksum you could use the md5sum command that you may have on your system (if you are running a Unix), or directly use Digest::MD5 I hope this helps Update: matija is right, go his way if you have a chance! Ciao! `--bronto` The very nature of Perl to be like natural language--inconsistant and full of dwim and special cases--makes it impossible to know it all without simply memorizing the documentation (which is not complete or totally correct anyway). --John M. Dlugosz	[reply]
Re: Downloading and parsing apache logs by TomDLux (Vicar) on Apr 30, 2004 at 15:02 UTC
As matija points out, getting along with the system administrtator is good for your health. If you have no choicee but to download the file, consider rsync. rsync is good at fetching only the changed portions of files. It should be suitable for minimizing the portion of the file which needs to be fetched each minute. However, you might need to provide some pre-processing to detect rollover, as I'm not sure how rsync would handle that. -- `TTTATCGGTCGTTATATAGATGTTTGCA`	[reply]