Re^2: RFC - Tie::Hash::Ranked
by stvn (Monsignor) on Oct 12, 2004 at 15:54 UTC
|
When I see something involving sorts I hope for, "Let me specify a pairwise comparison function." Tie::Hash::Sorted fails to satisfy either hope, and so I'd be unlikely to reach for it for serious work. Sorry.
While it is not a "pairwise comparison function", you can specify the sort routine for Tie::Hash::Sorted, just look here.
That said, if you would require it to use the module for serious work, why don't you send in a patch? I am sure Limbic~Region would not object :)
| [reply] |
|
|
Yes, I saw that I could specify the sort routine, but not in a way that I'd want to specify it. Furthermore the way that it is specified directly contradicts my desire to have operations be reasonably efficient.
As for sending a patch, making this module not have the algorithmic mistake doesn't require a patch, it requires a complete rewrite. If I did that rewrite, it would be easier to do it and not support an API that I am not interested in.
Since I don't currently have the need that this module was designed to address, I see no point in doing that work. And if I did have that need, I would still have no incentive to integrate the unwanted API.
| [reply] |
|
|
Yes, I saw that I could specify the sort routine, but not in a way that I'd want to specify it.
I agree it is not my prefered means either, however, sorting a hash can be done in many ways. By keys, by value, sort keys by value, sort values by keys etc etc. A simple pairwise comparison would not cover all these possibilities.
Take a look at how Tie::SortHash does it's pairwise sort function. It asks for a string which is then evaled into the sort block, and it always sorts on keys too. I am not sure that is a better solution.
So while I agree, its not my prefered style, I think it's a very flexible solution to the problem none the less.
As for sending a patch, making this module not have the algorithmic mistake doesn't require a patch, it requires a complete rewrite.
Well if you take issue with the underlying algorithim then yes, but as I pointed out above, I am not sure using a simple pairwise function would really make it
all that much better. I disagree too that it would take a complete rewrite. If you looked a little closer, just adding a new function Pairwise_Sort_Routine ($func) would accomplish this by creating a new function/closure which fits the Sort_Routine criteria but uses the $func passed to Pairwise_Sort_Routine as the comparison. Of course, your pairwise sort would again be very restricted, as I pointed out above.
Since I don't currently have the need that this module was designed to address, I see no point in doing that work. And if I did have that need, I would still have no incentive to integrate the unwanted API.
Sounds to me that your issue is more with the API (although you have supplied no hard details). As for your points about efficiency, I think it is somewhat lost since just using tie is a big enough hit already.
| [reply] [d/l] [select] |
|
|
|
|
|
Re^2: RFC - Tie::Hash::Ranked
by Limbic~Region (Chancellor) on Oct 12, 2004 at 16:01 UTC
|
tilly,
I understand the need for speed. I added as many optimizations as possible to only run the sort routine when needed. The user is able to specify their own sort routine which can lead to more efficient sorts. I also traded space for time. It can obviously be improved by adding a whole lot of complexity to the underlying code. That is more effort than I am willing to invest on a module I have never used myself and doesn't appear to be very popular.
I am not sure what you mean WRT a pairwise comparison function. Are you wanting to assign points to a key based off how it matches up against every other key and then sort them based off the results? That is certainly doable and I can give an example if you want. Just let me know if I am in left field.
| [reply] |
|
|
I am not sure what you mean WRT a pairwise comparison function.
The problem is not being able to sort using complicated orderings. That you have accomplished. But without a pairwise comparison function, you have to re-sort the entire collection every time an item is inserted. A better implementation would either do binary search on the sorted array to find the insertion point, or (if you're worried about splice not being constant time) use your favorite balanced tree structure instead of an underlying array. Both operate using pairwise comparisons, and take O(log n) for insertion/deletion, instead of O(n log n) as the module currently does.
That's a huge difference -- huge enough that it's not just a "need-for-speed" optimization. In fact, it's generally accepted that common (non-catastrophic) hash operations be no more than O(log n), as I think tilly was alluding to. And think about it -- even just naively looping through an unsorted array for every operation could accomplish all the ranking stuff you need in O(n) time! So sorting for every insertion is definitely a step backwards.
Update: I've looked at the code of the module. In fact it does not necessarily sort after every insertion, but you can easily construct a sequence of operations on the hash so that it does. Alternate STORE and FIRSTKEY operations k times and the total cost is O(kn log n). Using either approach mentioned above, the same sequence of operations costs O(k log n). So while your optimizations are nice, they don't actually help asymptotically, unless you perform O(n log n) insert/delete/fetch operations in between every call to keys. For many uses of a hash, this condition holds, but for a general tool I think I'd rather have all operations logarithmic ;)
| [reply] |
|
|
blokhead,
So it boils back down to efficiency. It does NOT resort for every insertion though. That is why I keep saying there are several optimizations that prevent it from resorting until necessary. It works like this:
- When FIRSTKEY is called, it will reorder the hash if the optimization flag is set to none or if the changed flag is on
- The changed flag doesn't get set if the change will not affect the ordering (a value change when the sort routine is only looking at keys)
This means that unless you invoke keys, values, or each between every insertion - there will only be 1 resorting by adding 50 new entries into the hash.
I had considered making changes "smart" but it was not clear to me that it would be more efficient. It all depends on how it is used. If the hash is asked for many times with only a few number of changes in between it makes sense to use "smart" inserts. If on the other hand the hash is asked for only intermittently with potentially many changes it makes sense to use my current approach.
I guess if I make any modifications, which I am currently not inclined to do - I would provide the option of selecting which "mode" to use since the efficiency of either approach depends on usage.
Update: (in response to your update):
they don't actually help asymptotically, unless you perform O(n log n) insert/delete/fetch operations in between every call to keys.
Not exactly. Unless there are changes and those changes effect the order and those changes accumulate to O(n log n).
| [reply] |