in reply to Re^6: Re-orderable keyed access structure?
in thread Re-orderable keyed access structure?
You could call it a bubble sort in a sorted array (O( n )) I guess. Except you're only looking at log n items. The splice is not more efficient. You need to move the entire array down one position to move a new element at the front. With a heap, you need to swap at most log n elements.
Google found me a good explanation of heaps. Look at the illustrations. It also explains storage of complete binary trees in an array.
Please do pick up a book or two on algorithms and data structures; this is stuff anyone who is serious about programming should know.
Makeshifts last the longest.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^8: Re-orderable keyed access structure?
by tye (Sage) on Aug 15, 2004 at 05:08 UTC | |
Note however that this is Perl. It is true that one splice is O($#array) in chunks of memory to be moved while a heap moves fewer chunks of memory, O(log($#array)). But it is also true that splice is a single Perl opcode while the heap will be O(log($#array)) in Perl opcodes. And I wouldn't be surprised if O(1 opcode)+O($#array moves) isn't quite often a win over O(log($#array) opcodes and moves). - tye | [reply] |
by Aristotle (Chancellor) on Aug 15, 2004 at 06:34 UTC | |
Sure, if k1 is much smaller than k2, O( k1 n ) will be smaller for small values of n than O( k2 log n ). Using builtins is a good way of getting very small values for k1, and I've asserted many times that this is a sensible optimization goal in Perl, even recently. But with n growing, the constants eventually become irrelevant. Since BrowserUk claims to be unable to hold all of his data in memory, I would assume this is such a situation. Even Perl's builtin splice won't move 100,000 elements down one position faster than spelled-out Perl code would swap 17 (≅ log2 100_000) elements. Makeshifts last the longest. | [reply] |
by tye (Sage) on Aug 16, 2004 at 04:34 UTC | |
Interesting that you pick 100,000. That was about the break-even point in some quick benchmarks. On one system the heap was a litle faster at that point, while on one they were neck-and-neck (or else splice was twice as fast, depending on the test). Moving one more chunk of memory when moving a bunch of them already is extremely fast (it is usually one assembly language instruction to move the entire region). Dispatching Perl opcodes is surprisingly slow. So the constants involved here are very large. But you are correct, the heap will eventually win when the dataset gets large enough. I'll try to include the code I used for testing when I get access to it again... - tye | [reply] |
|
Re^8: Re-orderable keyed access structure?
by BrowserUk (Patriarch) on Aug 15, 2004 at 13:35 UTC | |
Yes, you inspect log N item and move one, (steps 1, 2 and 3) below). But then you are not finished. You still need to swap items 1 and 2.
Now try making that a 7 item array and moving the middle item to the top. Count the number of comparisons and swaps required. In the end, you have had to move the middle item to the top and all the intervening items down. Splice does this directly. A heap algorithm does it one at a time. Splice does this in O(N). A heap algorithm does it using O(N log N). I have several good data structure & algorithm books, a couple of them are almost as old as you. Unlike you apparently, I haven't just read the headlines. I've also implemented many of the algorithms myself and understood the ramifications. I was simply waiting for you to catch up with the fact that the use of heaps has no benefit here. The likely size of the cache is a few hundred, maybe 1000 elements. More than this and I run out of file handles or memory. splice is way more efficient at moving 1 item in an array of this size than any implementation of a (binary search + swap) * (old_position - new_position) in Perl. | [reply] [d/l] |
by Aristotle (Chancellor) on Aug 15, 2004 at 19:05 UTC | |
Sorry, I'm not the one who seems to only have read headlines. A heap does not somehow entail a bubble sort. But let's leave the ad hominem out and look at facts.
A single swap requires inspecting exactly two elements, not log n. You need at most log n swaps total at any time.
Why? The heap condition is not violated at any point after your step 3 (which is really step 2, and swapping step 1). $a[0] > $a[1] and $a[0] > $a[2] is fulfilled, so the root and its children satisfy the condition. Likewise $a[1] > $a[3] and $a[1] > $a[4], so the left child of the root and its children satisfy the condition as well. $a[2] has no children, so it automatically satisfies the condition as well. Your step 4 is not required in a heap. Want me to demonstrate on a larger heap? Sure.
That's it. 3 swaps among a segment of 12 elements. In a heap with 100 elements, you need at most 7 swaps to get an item from the bottom of the heap to the top without violating the heap condition. I am doubtful of whether splice would win. In a heap with 1,000 elements, you need at most 10 swaps. How much money will you bet on splice? Makeshifts last the longest. | [reply] [d/l] [select] |
by BrowserUk (Patriarch) on Aug 15, 2004 at 20:47 UTC | |
But let's leave the ad hominem out... Please do pick up a book or two on algorithms and data structures; this is stuff anyone who is serious about programming should know. Yes. Let's do that. In a heap with 1,000 elements, you need at most 10 swaps. How much money will you bet on splice? Quite a lot, were I a betting man! :) Read more... (2 kB)
From where you left off. A new item not currently in cache is called for, it is read from disk, the lowest item* (currently index 12) is replaced by the new item in the array** and the new item given a weight of 17.
Now, another new item is called for, so I need to locate the lowest weighted item in the array. *How do I do this? And another problem, when I need to locate one of these items that are moving around in this heap via it's key. **How do I locate it? Actually, it's just the original one. That of maintaining the linkage between the items in the array(heap) and their keys. No matter how long I "look at the pictures"--or read the text--at heaps, I do not see the mechanism by which the lowest weighted item in the heap is located (other than a linear search). To re-state the requirements. I need to be able to:
| [reply] [d/l] [select] |
by Aristotle (Chancellor) on Aug 25, 2004 at 21:07 UTC | |
by BrowserUk (Patriarch) on Aug 25, 2004 at 21:35 UTC | |
by BrowserUk (Patriarch) on Aug 16, 2004 at 05:00 UTC | |
I wanted to see if how heaps could be made to work for this. As well as O(N) -v- O(log N) for any given part of an algorithm not telling the whole story, you also have to consider the cost of all parts of the algorithm. This benchmarks not only the promotion, but also building the list, promoting from any given position and removing (lowest weighted) items, until empty. Feel free to show me where I am using a bad implementation.
| [reply] [d/l] |