in reply to Re: Re: Efficiently parsing a large file
in thread Efficiently parsing a large file

My impression from the original post was that lines were sorted within a serial number, but there could be many lines of other serial numbers interspersed in between. If this is wrong, then you are correct.
  • Comment on Re: Re: Re: Efficiently parsing a large file

Replies are listed 'Best First'.
Re^4: Efficiently parsing a large file
by pelagic (Priest) on Apr 08, 2004 at 21:59 UTC
    I said nearly sorted. Sorted / unsorted is not binary. This is not a randomised file it's a logfile. All entries created by 1 specific transaction are sorted within themselves.
    That means for us that the number of not yet comleted series is relatively low.

    pelagic
      Yes, it's probably overkill for most situations, which is why I started off saying that hash is best for a limited number of elements.

      As specified in the original post however, the lines could have "Any" number of lines in between them. I didn't see anything there about events not spanning 100,000s of lines, so I didn't make assumptions that an event would complete in a timely manner. In the worst case this means you must remember some arbitrarily large number of past events until almost the entire file has been read (sorted, nearly sorted, or not).