in reply to Re^2: Find duplicate elements from an array
in thread Find duplicate elements from an array
When data sizes become large, the fact that “memory is virtual” becomes extremely critical. When a system begins to “thrash,” actual running-times rise exponentially ... a phenomenon with an obvious “elbow” in the time-curve which is not-so-affectionately known as “hitting the wall.” (Or, if you prefer, “the windshield.” As in, “one of those big ugly bugs with very colorful and juicy guts that are specially evolved to direct themselves, at the moment of their demise, directly and precisely at the middle of the driver’s side of the glass, about halfway up.” Or maybe a migrating swarm of them, and yours is the lucky automobile. Ugh.)
An algorithm based on external disk-sorting, then, will avoid that issue. The sort-program can be relied upon to use available memory efficiently, without overwhelming that resource when the file becomes very large. The programs that are subsequently written to use these known-to-be identically-sorted files are, as I said, “trivial.”
It’s so important that there are companies (http://www.syncsort.com) that do nothing else but “sort huge things faster than anyone else knows how to do.” “When you need them, you will pay them.”
The same strategy also shines when you are otherwise dealing with very large random-access disk files. Each access to such a file might entail one or more “seeks,” and this purely-physical consideration can drive completion times into the dungeon. An algorithm based on sorting can operate hundreds, or even thousands(!) of times faster.
It won’t shine ... it may well be more-costly ... when one of these resource-driven constraints do not exist; as is often the case. Is this, then, “a silver-bullet strategy?” No, it is somewhat of a pain in the asterisk. Memory is so plentiful and cheap these days that it is often a non-issue (although this is also why a program that seems to be very successful in system-acceptance testing, falls flat on its asterisk in production). This is a strategy that you need to be familiar with. “When you need it, you will know.”
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: Find duplicate elements from an array
by JavaFan (Canon) on Nov 22, 2010 at 15:02 UTC | |
by locked_user sundialsvc4 (Abbot) on Nov 22, 2010 at 17:14 UTC |