in reply to Log parsing by timestamp dilema

  1. As tall_man said, this is a typical situation of merge sort. You can try the module given by tall_man.

  2. The other approach is to do the merge sort by yourself.
    1. First sort each file needs to be merged, sounds like you have program for this already.
    2. Have your own merge sort program do this (for example you are trying to merge n files),
      1. open all n files you want to merge
      2. open output file for merged result
      3. read one line from each of those n files
      4. compare those lines, and write out the smallest one
      5. if a line has been written out in step 4, then read next line from that file, if eof, close that file.
      6. goto 4, unless all file closed, i.e. finished (there is some subtle thing here, make sure that after all file closed, you still write out all the left over lines already read in)

      This algorithm only holds n lines at the most, memory usage is carefully designed.


  3. Another approach is to have a single script running as log server, and all your monitors just send logs to that server thru UDP, in this way the log msgs would be in order from the beginning.

    I used this approach in a real telecomminucation system, and yet speed is not a problem.

    But you have to spend some time on the log server, carefully lay out things like block writing etc.

Replies are listed 'Best First'.
Re: Re: Log parsing by timestamp dilema
by Limbic~Region (Chancellor) on Feb 01, 2003 at 20:45 UTC
    pg,
    Thanks!
    I am going to see if I can't get tall_man's suggestion to work. The only obvious problem is figuring out how to hack File::MergeSort to give me file name/path. I will compare this to the speed of DaveH's integration of adrianh's approach. It added a lot of overhead to the speed of the original program, though it isn't really that big of a deal since the files are being read post write.

    Your option 2 looks identical to my reply to adrianh - I am also going to try my own version of this to see if I can make it any faster.

    Your option 3 is interesting, but isn't viable. I am looking for a quick turn around, but I may consider it in a future revision (I really need to start working on other projects).

    Cheers - L~R