in reply to Big hairy ugly log sorting merging problem
The individual log files by themselves are sorted, right? That's a classic merge sort situation.
You only need enough memory for as many lines as you have input files.
Update: forgot to answer the point about dupes, d'oh. If you're careful about which buffer to pick when there are ties in step 4, you can cluster dupes at that point. In your case, since the individual files will not contain dupes, but entries might be duplicated across files, you want to favour the buffer that was flushed the longest ago. That way, you will step through the files in synch if you're in sections containing identical data.
Makeshifts last the longest.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Big hairy ugly log sorting merging problem
by mr. jaggers (Sexton) on Aug 07, 2004 at 01:39 UTC | |
by Aristotle (Chancellor) on Aug 07, 2004 at 01:44 UTC | |
by mr. jaggers (Sexton) on Aug 07, 2004 at 02:02 UTC |