In Perl, the fastest watermark algorithm is probably

sub top_x { my $n = shift; my @top = splice @_, 0, $n; @top = ( sort { $a <=> $b } $_, @top )[ 1 .. $n ] for @_; return @top; }

With the mergesort used in newer versions of Perl and @top being nearly sorted in all iterations but the first, sort will do its work rapidly. That will almost certainly beat any explicitly spelled out algorithm except for truly large values of $n and even longer lists (like maybe selecting the top 10,000 out of 1,000,000 elements; maybe not even that). Though I'm not sure it even beats a straight sort+slice… achieving that probably requires a list of a few thousand elements.

I had a similar wakeup call when I tried to use a heap to compete against a splice algorithm a while ago. (I can't be bothered to Super Search it right now.)

If someone cares to benchmark this, I'd very interested to see how the numbers look in practice.

It is sometimes frustrating, but clever Perl algorithms can very rarely beat builtins. If you want competitive algorithmic elegance, you'll have to drop back to XS. An XS call has a certain fixed overhead cost though so for small lists you might still lose.

Makeshifts last the longest.


In reply to Re: Better mousetrap (getting top N values from list X) by Aristotle
in thread Better mousetrap (getting top N values from list X) by Limbic~Region

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.