in reply to Downloading and parsing apache logs

Are the files really rotated on the one minute intervals? Or do you just mean that you want to download the big file at one minute intervals, but it gets rotated less frequently (perhaps once per day)?

Assuming the later, you could remember the first line of a file, and the number of the line you last processed. When you fetch the file, you examine it. If it's the same as the line you saved, the file has not been rotated yet, so you know how many lines to skip without parsing - (that's the second value you remembered). If it's not the same, the file has been rotated and you restart the "lines already seen" counter to zero.

Having said that, fetching the webserver logs every minute seems extraordinarily wastefull. The traffic from transfering the logs would soon become a significant chunk of the total site traffic. If you need the up-to-the-minute information, you are much better off talking with whoever controls the server, and arranging for a small monitoring program to read the log file as it is written (perhaps using File::Tail), process as much information as possible, and forward that information to the script on your server.

  • Comment on Re: Downloading and parsing apache logs

Replies are listed 'Best First'.
Re: Downloading and parsing apache logs
by juanmarcosmoren (Initiate) on Apr 29, 2004 at 11:57 UTC
    OK, I didn't express very well myself.

    The files are rotated daily.
    So when I get the URL and in a minute I get the same URL again it has a few new log entries that are the ones I want to parse. Once a day the file is truncated so all the previous log entries are lost (but I don't need them).
    As I said I can't control the server, it's definitely not on my hands.
      Let me put it this way:

      If I were the admin of that server, and you presented your case well enough, possibly with a simple program that would monitor the logfile and send only the essential data out, I would probably be persuaded to let you run it. Or at least suggest some kind of an alternative.

      If, on the other hand, I found that the server which has "a few new log entries" per minute is spending most of it's time sending it's own daily log to one user, I would make that user very, very sorry. As in bastard operator from hell sorry.

      Trust me on this: talk to the people managing the server first. If you try talking to them after they've shut you down for abusing their services, you won't get anywhere.

      Yes, I admin a lot of servers. No, I'm not mean. Usually.