in reply to Re: RFC - Tie::Hash::Ranked
in thread RFC - Tie::Hash::Ranked

tilly,
I understand the need for speed. I added as many optimizations as possible to only run the sort routine when needed. The user is able to specify their own sort routine which can lead to more efficient sorts. I also traded space for time. It can obviously be improved by adding a whole lot of complexity to the underlying code. That is more effort than I am willing to invest on a module I have never used myself and doesn't appear to be very popular.

I am not sure what you mean WRT a pairwise comparison function. Are you wanting to assign points to a key based off how it matches up against every other key and then sort them based off the results? That is certainly doable and I can give an example if you want. Just let me know if I am in left field.

Cheers - L~R

Replies are listed 'Best First'.
Re^3: RFC - Tie::Hash::Ranked
by blokhead (Monsignor) on Oct 12, 2004 at 16:16 UTC
    I am not sure what you mean WRT a pairwise comparison function.
    The problem is not being able to sort using complicated orderings. That you have accomplished. But without a pairwise comparison function, you have to re-sort the entire collection every time an item is inserted. A better implementation would either do binary search on the sorted array to find the insertion point, or (if you're worried about splice not being constant time) use your favorite balanced tree structure instead of an underlying array. Both operate using pairwise comparisons, and take O(log n) for insertion/deletion, instead of O(n log n) as the module currently does.

    That's a huge difference -- huge enough that it's not just a "need-for-speed" optimization. In fact, it's generally accepted that common (non-catastrophic) hash operations be no more than O(log n), as I think tilly was alluding to. And think about it -- even just naively looping through an unsorted array for every operation could accomplish all the ranking stuff you need in O(n) time! So sorting for every insertion is definitely a step backwards.

    Update: I've looked at the code of the module. In fact it does not necessarily sort after every insertion, but you can easily construct a sequence of operations on the hash so that it does. Alternate STORE and FIRSTKEY operations k times and the total cost is O(kn log n). Using either approach mentioned above, the same sequence of operations costs O(k log n). So while your optimizations are nice, they don't actually help asymptotically, unless you perform O(n log n) insert/delete/fetch operations in between every call to keys. For many uses of a hash, this condition holds, but for a general tool I think I'd rather have all operations logarithmic ;)

    blokhead

      blokhead,
      So it boils back down to efficiency. It does NOT resort for every insertion though. That is why I keep saying there are several optimizations that prevent it from resorting until necessary. It works like this:
      • When FIRSTKEY is called, it will reorder the hash if the optimization flag is set to none or if the changed flag is on
      • The changed flag doesn't get set if the change will not affect the ordering (a value change when the sort routine is only looking at keys)
      This means that unless you invoke keys, values, or each between every insertion - there will only be 1 resorting by adding 50 new entries into the hash.

      I had considered making changes "smart" but it was not clear to me that it would be more efficient. It all depends on how it is used. If the hash is asked for many times with only a few number of changes in between it makes sense to use "smart" inserts. If on the other hand the hash is asked for only intermittently with potentially many changes it makes sense to use my current approach.

      I guess if I make any modifications, which I am currently not inclined to do - I would provide the option of selecting which "mode" to use since the efficiency of either approach depends on usage.

      Cheers - L~R

      Update: (in response to your update):
      they don't actually help asymptotically, unless you perform O(n log n) insert/delete/fetch operations in between every call to keys.

      Not exactly. Unless there are changes and those changes effect the order and those changes accumulate to O(n log n).