in reply to Re^5: In-place sort with order assignment
in thread In-place sort with order assignment
I guess I'm being thick here but, at the end of your second pass, you have 2000 items in 2 arrays, and the original number in the hash. Then what?
And, does each "pass" mean a pass over 1000 (or 2000) elements of the array(s)? Or a full (millions) pass over the keys of the hash?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^7: In-place sort with order assignment
by Limbic~Region (Chancellor) on Sep 19, 2010 at 16:52 UTC | |
Ok, I must be the one being thick here so I will give you a real solution as soon as I finish lunch. Update: As you can see, this works but could be greatly improved with a data structure other than an array to maintain order. I converted the 2nd array to a hash to avoid doing a binary search since American football is about to come on and Sunday is family day :-) Update 2: This requires N / M passes through the hash where N represents the total number of keys and M represents the maximum number of items to sort at once. Using an array and a hash to keep track of the current unknown bottom N is extremely inefficient (from runtime perspective) but data structures that support a faster run time also consume more memory. Finding a balance of speed and memory is left as an exercise for the reader :-) Cheers - L~R | [reply] [d/l] |
by BrowserUk (Patriarch) on Sep 19, 2010 at 19:19 UTC | |
Many thanks for the code because I had drawn a complete blank on the description. The problem with it is that (as currently implemented) it is quadratic in time:
That's using M as 10% of N. Which I project means over 24 hours for a million items and 4 days for 2 million. I appreciate that doing an insertion sort using splice can be improved upon using (say) a heap, but most of the modules implementing alternatives to perl's built-in data structures, tend to be implemented using objects wrapped over hashes or arrays, and so what you gain from a somewhat more intelligent DS, you loose from the calling overheads :( Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
by Limbic~Region (Chancellor) on Sep 20, 2010 at 01:16 UTC | |
I have no idea how much additional memory Heap::Simple::XS uses under the covers, but the speed is dramatically faster than using splice with a binary search (above).
Cheers - L~R | [reply] [d/l] |
by BrowserUk (Patriarch) on Sep 20, 2010 at 07:40 UTC | |
by Limbic~Region (Chancellor) on Sep 20, 2010 at 13:18 UTC | |
| |
by Limbic~Region (Chancellor) on Sep 19, 2010 at 23:17 UTC | |
Perhaps the following minor modifications are more to your liking:
Cheers - L~R | [reply] [d/l] |
by Limbic~Region (Chancellor) on Sep 23, 2010 at 18:48 UTC | |
I have already shared this with you on my scratch but I would like to leave it here as well for posterity purposes. It turns out that even without the overhead of function calls, a pure perl heap still doesn't compete with the binary search and splice. As tye pointed out in the CB as I was lamenting, count opcodes. My guess is that perl has very optimized C under the covers using memcpy and what not making it superior to a home grown heap. I do want to point out that some of the complexity in the example below is because the heap has been special cased to this specific problem (limiting size). A general purpose implementation would be more powerful though less complex due to abstraction. This means it would also be slower.
Cheers - L~R | [reply] [d/l] |