All,
First I would like to give a sales pitch on my one and only publicly available module - Tie::Hash::Sorted. It will be the basis for the new module and it doesn't appear to be well known or well used. You can skip down to the Proposal section without losing anything if you are already familiar with Tie::Hash::Sorted.

Why use Tie::Hash::Sorted

It seems that on a daily basis, some one asks how to sort a hash. You don't need a module to do that for you, so why use one? I have also considered adding several new features, but the response wasn't very overwhelming.

Proposal for Tie::Hash::Ranked

The idea was inspired by this question as well as not wanting to give Tie::Hash::Sorted code bloat by adding functionality. The following methods would be added:

Feedback Request

Here are the things I would appreciate feedback on:

Cheers - L~R

Replies are listed 'Best First'.
Re: RFC - Tie::Hash::Ranked
by kvale (Monsignor) on Oct 12, 2004 at 15:34 UTC
    With the exception of Get_Rank, your new functionality is just implementing order statistics on top of Tie::Hash::Sorted. If one has access to the sorted array, it would seem easier to me to extract the elements directly than download another module, install it, and figure out the API.

    But having the functions within the module already would tip the convienence factor the other way. So I would be in favor of including them in your current API for Tie::Hash::Sorted. I expect the methods would be simple to implement and would result in little code bloat.

    If you do choose to breakout into a new module, I would recommend a more expressive name like Tie::Hash::RankStatistics. <update> In the area of new methods, I think a median method would be useful: return the middle element, with a well-defined convention for even numbers of hash elements.

    -Mark

      kvale,
      is just implementing order statistics on top of

      I am not sure I have described the methods properly nor am I opposed to just adding the functionality to Tie::Hash::Sorted. The rationale for breaking them out is because they would dramatically change the way the hash behaved. Basically it would only be giving you a window of the hash. So if the hash had 50 keys but you had set Top_X to 10, looping over the keys of the hash would only loop 10 times. Additionally, having direct access to the underlying array will give wrong unexpected results. This is due to the way the optimizations have been added in. Resorting (and hence array creation) is only done when necessary.

      Thanks for the idea about a median method. I am not sure about the name suggestion since I think we may be talking past each other.

      Cheers - L~R

Re: RFC - Tie::Hash::Ranked
by tilly (Archbishop) on Oct 12, 2004 at 15:41 UTC
    First the unimportant item. About a third of the way through the documentation you left "c" out of "function".

    That said, when I see a hash I hope for, "Operations are O(something much less than n)." When I see something involving sorts I hope for, "Let me specify a pairwise comparison function." Tie::Hash::Sorted fails to satisfy either hope, and so I'd be unlikely to reach for it for serious work. Sorry.

      When I see something involving sorts I hope for, "Let me specify a pairwise comparison function." Tie::Hash::Sorted fails to satisfy either hope, and so I'd be unlikely to reach for it for serious work. Sorry.

      While it is not a "pairwise comparison function", you can specify the sort routine for Tie::Hash::Sorted, just look here.

      That said, if you would require it to use the module for serious work, why don't you send in a patch? I am sure Limbic~Region would not object :)

      -stvn
        Yes, I saw that I could specify the sort routine, but not in a way that I'd want to specify it. Furthermore the way that it is specified directly contradicts my desire to have operations be reasonably efficient.

        As for sending a patch, making this module not have the algorithmic mistake doesn't require a patch, it requires a complete rewrite. If I did that rewrite, it would be easier to do it and not support an API that I am not interested in.

        Since I don't currently have the need that this module was designed to address, I see no point in doing that work. And if I did have that need, I would still have no incentive to integrate the unwanted API.

      tilly,
      I understand the need for speed. I added as many optimizations as possible to only run the sort routine when needed. The user is able to specify their own sort routine which can lead to more efficient sorts. I also traded space for time. It can obviously be improved by adding a whole lot of complexity to the underlying code. That is more effort than I am willing to invest on a module I have never used myself and doesn't appear to be very popular.

      I am not sure what you mean WRT a pairwise comparison function. Are you wanting to assign points to a key based off how it matches up against every other key and then sort them based off the results? That is certainly doable and I can give an example if you want. Just let me know if I am in left field.

      Cheers - L~R

        I am not sure what you mean WRT a pairwise comparison function.
        The problem is not being able to sort using complicated orderings. That you have accomplished. But without a pairwise comparison function, you have to re-sort the entire collection every time an item is inserted. A better implementation would either do binary search on the sorted array to find the insertion point, or (if you're worried about splice not being constant time) use your favorite balanced tree structure instead of an underlying array. Both operate using pairwise comparisons, and take O(log n) for insertion/deletion, instead of O(n log n) as the module currently does.

        That's a huge difference -- huge enough that it's not just a "need-for-speed" optimization. In fact, it's generally accepted that common (non-catastrophic) hash operations be no more than O(log n), as I think tilly was alluding to. And think about it -- even just naively looping through an unsorted array for every operation could accomplish all the ranking stuff you need in O(n) time! So sorting for every insertion is definitely a step backwards.

        Update: I've looked at the code of the module. In fact it does not necessarily sort after every insertion, but you can easily construct a sequence of operations on the hash so that it does. Alternate STORE and FIRSTKEY operations k times and the total cost is O(kn log n). Using either approach mentioned above, the same sequence of operations costs O(k log n). So while your optimizations are nice, they don't actually help asymptotically, unless you perform O(n log n) insert/delete/fetch operations in between every call to keys. For many uses of a hash, this condition holds, but for a general tool I think I'd rather have all operations logarithmic ;)

        blokhead

Re: RFC - Tie::Hash::Ranked
by SpanishInquisition (Pilgrim) on Oct 12, 2004 at 14:48 UTC
    Marketing question -- with a use case that won't come up very often, how do you expect people to find your module when they want a solution for that use case? I don't have the answer to this question, but if anything I would add another mode to your existing module, since it may be easier for someone to stumble upon that way.
      SpanishInquisition,
      how do you expect people to find your module when they want a solution for that use case?

      By using http://search.cpan.org.

      I had considered that originally when I requested feedback on adding functionality to Tie::Hash::Sorted. There wasn't any "I would certainly use your module if only it had feature x". The new idea of top/bottom x fell into the same ranking theme as a few of the other ideas I had so it makes logical sense to group them together.

      I am not sure extending it will help. I get the feeling people don't find it useful. I have Super Searched here for people suggesting Tie::Hash::Sorted as a solution. Other than myself and diotalevi, who helped me considerably on the module, there is not a single instance.* I originally wrote the module not for my own use, but in response to a SoPW question here. That is also the only reason I would write the new module or extend the current one. I don't want to waste my time if no one feels it will be useful.

      Cheers - L~R

      * There is one instance, but it was prompted by myself in a /msg
        I don't want to waste my time if no one feels it will be useful.

        In his preface to his book "Portrait of Dorian Gray", Oscar Wilde once wrote, "All Art is Quite Useless". I have always seen this as an argument for the concept of "Art for Art's Sake" (popularized by Charles Baudelaire and the impressionists (Monet, Renoir, etc)).

        My point, if you would find enjoyment in creating Tie::Hash::Ranked, then create it. Don't worry if anyone finds it useful, code for code's sake is reason enough.

        -stvn
Re: RFC - Tie::Hash::Ranked
by radiantmatrix (Parson) on Oct 12, 2004 at 16:37 UTC

    Hm, well I would have a use for something like this. I'd love to be able to manipulate hashes like this through a standardized API. In fact, I already use Tie::Hash::Sorted for some things.

    In terms of suggestions... why not make your module a wrapper-replacement for Tie::Hash::Sorted? Import all of its methods, and extend it with your ranking items (and some of the stuff you suggested in your new features list. Call the new module Tie::Hash::Sorted::Rank or something.

    Most importantly, keep the interface consistent with Tie::Hash::Sorted so that new users of your module will not need to relearn anything.

    radiantmatrix
    require General::Disclaimer;
      radiantmatrix,
      Provided a did a good job (it was my first real OO project), I intend to subclass Tie::Hash::Sorted. I am not sure if this will be possible without doing some re-writing. Given that it is only my intention and that I follow the path of least resistence, I figured I was safe saying it would be based on Tie::Hash::Sorted without promising it as a subclass.

      Cheers - L~R

Re: RFC - Tie::Hash::Ranked
by DrHyde (Prior) on Oct 13, 2004 at 09:06 UTC
    Name clash! There's already Tie::Hash::Rank (written by me) so having Tie::Hash::Ranked do something different will confuse the hell out of people. Why not just implement the extra methods in Tie::Hash::Sorted? I'm not swayed by the bloat argument in this case.
      DrHyde,
      Thank you for letting me know about Tie::Hash::Rank. I am seriously leaning towards a new namespace not only because of code bloat, but also because the methods I am talking about will dramatically change the way the hash behaves. If I change to a tree structure under the hood, I will need to make a new module or break backwards compatability. That is poor OO design - but as I said, it was my first real OO project.

      So when and if I am ready, I will be sure to be considerate of pre-existing name spaces.

      Cheers - L~R

Re: RFC - Tie::Hash::Ranked
by gmpassos (Priest) on Oct 12, 2004 at 21:17 UTC
    We also need Tie::Hash::Ordened, where the keys stay in the order that they are created in the HASH. This is important to have a hash of hashes/arrays tree with ordened nodes, like XML strcuture.

    Graciliano M. P.
    "Creativity is the expression of the liberty".