This is what a "heap sort" is great for (finding and sorting the top N items of a very large list). I keep wanting to implement this in Perl. I don't think the Heap modules include this functionality (which I find a bit strange, but *shrug*, maybe I'm just not seeing it -- the modules are a bit more complicated than I expect so I either didn't dig very deep into them or I just don't at the moment remember having done so).

In any case, the technique isn't horribly difficult and I'm sure google will find you several good tutorials on it.

Update: So far, all of the other responses are waving their hands vaguely around "keep track of the top N items". That sounds simple enough but some ways are certainly more efficient for doing this than others.

The first naive method would be to keep the top N items unsorted in a list and just compare new items as you get them against every single item in the list until you find one that you can replace with the new one, or finish comparing and reject the new one. That is a ton of compares, perhaps even more than just sorting the full list.

The second naive method would be to keep the top N items in sorted order so that you can do a single compare to tell whether to reject the new item. But, if you decide to keep it, then you have to insert it into the list in sorted order. That will take you log(N) compares (if you use a binary search) and then you'll have to shift about 1/2 of the list over to fit it in (or use something like a linked list which isn't very Perlish and so are a bit slower).

The heap sort keeps the top N items in an array in a partially sorted order. You can still tell whether a new item should be insert or not with a single comparison. However, the algorithm tells you how to only use log(N) comparisons and log(N) swappings of items to insert a new item while preserving the partial ordering. Then it tells you how to pull the items out of the "heap" so that they all come out in fully sorted order.

                - tye

In reply to Re: *Fastest* way to print a hash sorted by value (heap) by tye
in thread *Fastest* way to print a hash sorted by value by smellysocks

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.