Re^3: RFC - Tie::Hash::Ranked

I am not sure what you mean WRT a pairwise comparison function.

The problem is not being able to sort using complicated orderings. That you have accomplished. But without a pairwise comparison function, you have to re-sort the entire collection every time an item is inserted. A better implementation would either do binary search on the sorted array to find the insertion point, or (if you're worried about splice not being constant time) use your favorite balanced tree structure instead of an underlying array. Both operate using pairwise comparisons, and take O(log n) for insertion/deletion, instead of O(n log n) as the module currently does.

That's a huge difference -- huge enough that it's not just a "need-for-speed" optimization. In fact, it's generally accepted that common (non-catastrophic) hash operations be no more than O(log n), as I think tilly was alluding to. And think about it -- even just naively looping through an unsorted array for every operation could accomplish all the ranking stuff you need in O(n) time! So sorting for every insertion is definitely a step backwards.

Update: I've looked at the code of the module. In fact it does not necessarily sort after every insertion, but you can easily construct a sequence of operations on the hash so that it does. Alternate STORE and FIRSTKEY operations k times and the total cost is O(kn log n). Using either approach mentioned above, the same sequence of operations costs O(k log n). So while your optimizations are nice, they don't actually help asymptotically, unless you perform O(n log n) insert/delete/fetch operations in between every call to keys. For many uses of a hash, this condition holds, but for a general tool I think I'd rather have all operations logarithmic ;)

blokhead

Comment on Re^3: RFC - Tie::Hash::Ranked

Replies are listed 'Best First'.
Re^4: RFC - Tie::Hash::Ranked by Limbic~Region (Chancellor) on Oct 12, 2004 at 16:38 UTC
blokhead, So it boils back down to efficiency. It does NOT resort for every insertion though. That is why I keep saying there are several optimizations that prevent it from resorting until necessary. It works like this: When FIRSTKEY is called, it will reorder the hash if the optimization flag is set to none or if the changed flag is on The changed flag doesn't get set if the change will not affect the ordering (a value change when the sort routine is only looking at keys) This means that unless you invoke keys, values, or each between every insertion - there will only be 1 resorting by adding 50 new entries into the hash. I had considered making changes "smart" but it was not clear to me that it would be more efficient. It all depends on how it is used. If the hash is asked for many times with only a few number of changes in between it makes sense to use "smart" inserts. If on the other hand the hash is asked for only intermittently with potentially many changes it makes sense to use my current approach. I guess if I make any modifications, which I am currently not inclined to do - I would provide the option of selecting which "mode" to use since the efficiency of either approach depends on usage. Cheers - L~R Update: (in response to your update): they don't actually help asymptotically, unless you perform O(n log n) insert/delete/fetch operations in between every call to keys. Not exactly. Unless there are changes and those changes effect the order and those changes accumulate to O(n log n).	[reply]

Replies are listed 'Best First'.

Re^4: RFC - Tie::Hash::Ranked
by Limbic~Region (Chancellor) on Oct 12, 2004 at 16:38 UTC

blokhead

NOT

When FIRSTKEY is called, it will reorder the hash if the optimization flag is set to none or if the changed flag is on
The changed flag doesn't get set if the change will not affect the ordering (a value change when the sort routine is only looking at keys)

keys

values

each

I had considered making changes "smart" but it was not clear to me that it would be more efficient. It all depends on how it is used. If the hash is asked for many times with only a few number of changes in between it makes sense to use "smart" inserts. If on the other hand the hash is asked for only intermittently with potentially many changes it makes sense to use my current approach.

I guess if I make any modifications, which I am currently not inclined to do - I would provide the option of selecting which "mode" to use since the efficiency of either approach depends on usage.

Cheers - L~R

Update: (in response to your update):
they don't actually help asymptotically, unless you perform O(n log n) insert/delete/fetch operations in between every call to keys.

Not exactly. Unless there are changes and those changes effect the order and those changes accumulate to O(n log n).

[reply]