thezip has asked for the wisdom of the Perl Monks concerning the following question:
Hello all,
I have a log parsing problem, and I seek suggestions as to a reasonable Perlish solution -- I'm not really looking for any code per se, just algorithmic "advice".
First, I'll address the things that are given and cannot be changed.
Each day, I collect a dump from a logfile generator, which is the accumulation of all log entries since the beginning of that month. Each day, a new file is collected, and is theoretically at least as big as the previous day's file. I do not have the ability to directly control this "logfile source", so I must deal with the cumulative nature of the resulting files.
Occasionally, through magic processes that I also have no control over, there may a purging of the "logfile source", which consequently causes the next day's cumulative file size to restart from 0 bytes, and then contain only what was collected after the purge.
My program must "reconstruct" all of the unique log entries for the given month for a given server.
Assumptions:
In summary, there will be around 310 files, each having size somewhat over 1.2 MB -- nothing major. Each server will have its logs unique-ified into its own file.
Certainly, in Unix, I could do something like:
1) Concatenate files into a single file 2) Then do: `sort -u <concatfile> > <sortedfile>`
... but I suspect this will eventually live on a Windows box.
I thought that maybe I could do an MD5 digest for each log entry, and then use that as a hash key for subsequent collision checks (ie. ignore all subsequent redundancies).
Thoughts?
|
|---|