in reply to Re: Super fast file creation needed
in thread Super fast file creation needed

For #2, I don't really see the point of hashing the lines (with a checksum) and then storing them in a hash (which will re-hash the hashed values). The only reason I can think of would be an attempt to speed lookups by using the checksum as a shorter hash key, but I would expect the extra time spent computing the checksums to overshadow any gains in lookup time. And then there's also the question of possible hash collisions in the checksums, which means more wasted time on redundantly handling that (since Perl hashes already have collision handling for their hashed keys).

Just using the log lines as your hash keys directly seems simpler, faster, and more reliable, unless I'm missing something here. Am I?

Replies are listed 'Best First'.
Re^3: Super fast file creation needed
by thezip (Vicar) on Oct 19, 2007 at 06:26 UTC

    I understood that the logfiles were huge, which IMHO, makes storing the entire lines as hash keys impractical due to memory considerations.

    Sure, computing checksums/digests might slow things down some, but it is one way to identify whether a line has been seen or not. With the proper digest length, hash key collisions could be virtually eliminated.

    In this case, I think the memory considerations outweigh the speed considerations, but it would certainly be prudent to benchmark both ways to see which one works better.


    Where do you want *them* to go today?
      Ah, OK. Fair enough. Thanks for clearing that up!