in reply to Re^3: RFC - Tie::Hash::Ranked
in thread RFC - Tie::Hash::Ranked

Yes, I saw that I could specify the sort routine, but not in a way that I'd want to specify it.

I agree it is not my prefered means either, however, sorting a hash can be done in many ways. By keys, by value, sort keys by value, sort values by keys etc etc. A simple pairwise comparison would not cover all these possibilities.

Take a look at how Tie::SortHash does it's pairwise sort function. It asks for a string which is then evaled into the sort block, and it always sorts on keys too. I am not sure that is a better solution.

So while I agree, its not my prefered style, I think it's a very flexible solution to the problem none the less.

As for sending a patch, making this module not have the algorithmic mistake doesn't require a patch, it requires a complete rewrite.

Well if you take issue with the underlying algorithim then yes, but as I pointed out above, I am not sure using a simple pairwise function would really make it all that much better. I disagree too that it would take a complete rewrite. If you looked a little closer, just adding a new function Pairwise_Sort_Routine ($func) would accomplish this by creating a new function/closure which fits the Sort_Routine criteria but uses the $func passed to Pairwise_Sort_Routine as the comparison. Of course, your pairwise sort would again be very restricted, as I pointed out above.

Since I don't currently have the need that this module was designed to address, I see no point in doing that work. And if I did have that need, I would still have no incentive to integrate the unwanted API.

Sounds to me that your issue is more with the API (although you have supplied no hard details). As for your points about efficiency, I think it is somewhat lost since just using tie is a big enough hit already.

-stvn

Replies are listed 'Best First'.
Re^5: RFC - Tie::Hash::Ranked
by tilly (Archbishop) on Oct 12, 2004 at 17:12 UTC
    I agree that tie is inefficient. But it is a constant factor overhead - your code is slower but still scales reasonably. I draw a big distinction between that and algorithmic inefficiency which can cause code to fall over surprisingly fast.

    As for the API, pass me the information any way you want. All of the options that you mention are supported if I get the following information:

    my $cmp = sub { my ($key_a, $value_a, $key_b, $value_b) = @_; # do some comparison };
      I draw a big distinction between that and algorithmic inefficiency which can cause code to fall over surprisingly fast.

      Not that I doubt you, but could you please be more specific as to why you think this code will "fall over surprisingly fast". I mean, maybe you are correct, but I would think one would need to really analyze the code, and runs some tests/benchmarks before making such a strong declaration.

      As for the API, pass me the information any way you want. All of the options that you mention are supported if I get the following information:

      I will only just say that this contradicts the efficiency argument, since it requires you to do two (possibly unnessecary) hash lookups for each loop of the sort.

      To be honest, I am not really looking to argue with you about this, but only to point out that your comment , which was very short, made some assumptions which while not wholly wrong, were not really all that right either and certainly were not founded upon use or study of the module in question. You being a high level saint, and a long standing member of this community means that many people listen to and place value on what you say. To make quick, ambigious and somewhat disparaging comments about a module you have never used and never plan to use is (IMO) not really useful to the discussion.

      -stvn
        More specific? Sure.

        It's all in the big O. Suppose that your code runs OK when you have 1000 data points, how long do you expect it to take when it processes a sample of 50,000 data points? 50 times as long, or 2500 times as long? The latter can cause very nasty surprises. For an extreme example, if this code is being used somewhere that you have constant requests coming in (eg a website), as soon as the time to respond goes over the time between requests, your system will fall over. Been there, done that, not fun.

        Therefore I'm careless of micro-optimizations until there is a proven problem, but I tend to stay much more aware of algorithmic efficiency issues.

        As for whether it is useful to make quick comments about a module, what I'm really trying to communicate is that this is how I evaluate unknown modules. I sanity check them, and if there a significant red flag comes up, I won't use them in serious work. I do this because I think that it is a good practice. Because I think that it is a good practice, I think that there is value in encouraging other people to think about how they handle this. Certainly much more value than the common refrain of saying, It's on CPAN, use it!