Re^2: How to quickly parse a huge web log file?

For some log files the time stamps aren't exactly in order, and this process might lose a few records.

You should be able to find out of your log files suffer from this problem in a unix shell with a combination of 'cut' to extract the date, piped to 'uniq', and see if each date only shows up once.

The issue is this -- there are a few webservers that will record the time that the request was sent, but they write to the log until after it's been sent. If you have long-running CGIs and small static documents being served from the same server, and the server is very busy, you can end up with records from one day being written out before ones from the previous day.

...

In general, though, this is a great suggestion, and even if you do suffer from this issue, you'll likely only lose minimal records.

Comment on Re^2: How to quickly parse a huge web log file?