in reply to Re^2: Super fast file creation needed
in thread Super fast file creation needed

I understood that the logfiles were huge, which IMHO, makes storing the entire lines as hash keys impractical due to memory considerations.

Sure, computing checksums/digests might slow things down some, but it is one way to identify whether a line has been seen or not. With the proper digest length, hash key collisions could be virtually eliminated.

In this case, I think the memory considerations outweigh the speed considerations, but it would certainly be prudent to benchmark both ways to see which one works better.


Where do you want *them* to go today?

Replies are listed 'Best First'.
Re^4: Super fast file creation needed
by dsheroh (Monsignor) on Oct 19, 2007 at 14:50 UTC
    Ah, OK. Fair enough. Thanks for clearing that up!