in reply to Re^2: need for speed - how fast is a hash
in thread need for speed - how fast is a hash
It all has to do with the data volumes we are talking about, as I referred-to in my original response.
A hash-table, or a list, is of course an “in-memory data structure.” And that means virtual memory, of course, which is really not quite “memory” at all, but rather a very souped-up, hardware-accelerated, over-glorified disk file. This distinction is not particularly important until data volumes become large ... at which time the issue of “page faults” begins to rear its ugly head.
Virtual memory subsystems have one rather-strange characteristic: “they’re good, then they’re not-quite so good, then (boom!) they’re awful.” There is absolutely nothing in-between. If you plot a performance curve, you see a gradual mostly-linear progression, then (boom!) it “hits the wall” (the so-called thrash point) at which time the graph becomes exponential...
We are talkin’ Old Testament. I mean, real wrath of God stuff! 40 years of torment, earthquakes, tidal waves! The dead rising from the grave! World destruction, human sacrifices, Dogs and Cats Living Together....Mass Hysteria!!!
(Ahem...) I digress...
Disk-based sorting, on the other hand, with any current algorithm, is basically O(logxn). Which basically means, “it’s never gonna be exponential, and that’s really all that matters.” No matter how many millions of records you have, it’s going to take a predictable (and reasonable, considering...) amount of time to sort them all. And, once you have finished sorting, it will take one, linear pass through the sorted file(s) to finish the entire job.
If you have a “modest” amount of data to deal with, this technique might make no difference, and it might even be slower. But if you have large amounts of data, it will make all the difference in the universe.
Well, that might be a bit of an exaggeration... but at least we won’t have Dogs and Cats Living Together. And that, surely, is a good thing ...
Memo to File: From now on, do not undertake to make comments on PerlMonks after having watched both installments of Ghostbusters back-to-back in a single evening ...
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: need for speed - how fast is a hash
by oko1 (Deacon) on Jan 07, 2011 at 04:54 UTC |