Re^4: Efficiently parsing a large file

I said nearly sorted. Sorted / unsorted is not binary. This is not a randomised file it's a logfile. All entries created by 1 specific transaction are sorted within themselves.
That means for us that the number of not yet comleted series is relatively low.

pelagic

Comment on Re^4: Efficiently parsing a large file

Replies are listed 'Best First'.
Re: Re^4: Efficiently parsing a large file by bluto (Curate) on Apr 08, 2004 at 23:35 UTC
Yes, it's probably overkill for most situations, which is why I started off saying that hash is best for a limited number of elements. As specified in the original post however, the lines could have "Any" number of lines in between them. I didn't see anything there about events not spanning 100,000s of lines, so I didn't make assumptions that an event would complete in a timely manner. In the worst case this means you must remember some arbitrarily large number of past events until almost the entire file has been read (sorted, nearly sorted, or not).	[reply]

Replies are listed 'Best First'.

Re: Re^4: Efficiently parsing a large file
by bluto (Curate) on Apr 08, 2004 at 23:35 UTC

As specified in the original post however, the lines could have "Any" number of lines in between them. I didn't see anything there about events not spanning 100,000s of lines, so I didn't make assumptions that an event would complete in a timely manner. In the worst case this means you must remember some arbitrarily large number of past events until almost the entire file has been read (sorted, nearly sorted, or not).

[reply]