Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re^2: Better mousetrap (getting top N values from list X)

by Anonymous Monk
on Feb 03, 2005 at 10:12 UTC ( [id://427569]=note: print w/replies, xml ) Need Help??


in reply to Re: Better mousetrap (getting top N values from list X)
in thread Better mousetrap (getting top N values from list X)

Yeah, but if you want to extract the top 26, do you really want to write:
my @top26 = my($a, $b, $c, $d, $e, $f, $g, $h, $i, $j, $k, $l, $m, $n, + $o, $p, $q, $r, $s, $t, $u, $v, $w, $x, $y, $z) = sort @foo
or would you prefer:
my @top26 = top (26, @foo);
I'm willing to use up to three variables - but if I only want to extract the top3, I could easily write a single pass algorithm, keeping track of the top3.

And you'd need to write tricky evals if you don't know which top N to take during compile time, but only at run time (for instance, because N is user input).

Replies are listed 'Best First'.
Re^3: Better mousetrap (getting top N values from list X)
by BrowserUk (Patriarch) on Feb 03, 2005 at 10:33 UTC

    Having tracked down (part of) the p5p thread demerphq mentioned, the suggestion was that you would also be able to employ the short-cicuited sort by using a slice:

    @most[ 0 .. 6 ] = sort @ary; # only sort 7 entries

    I'm not familiar with how the RHS hints are derived, but that would seem to address both your concerns.

    That said, I think topNbs() as posted by tall_man could be added to List::Util quite easily and would probably be easier to get accepted because of the lower risk.

    Then again, it doesn't really use a List, so maybe it is time for an Array::Util package.


    Examine what is said, not who speaks.
    Silence betokens consent.
    Love the truth but pardon error.
      The code would need more work before it could go into List::Util, since it only works for integer values, positive only, and only for the N highest (not lowest, or any arbitrary sorting function). It's a good idea, though.

      Since the basic algorithm needs only a size-N array, there ought to be some way to wrap this as a package that takes an arbitrary input set of unlimited size, not just an array ref as we have here.

      my $acc = new NGreatest($n); $acc->insert(\@arr); while <> { $acc->insert(chomp $_); } my @top = $acc->result();
        The code would need more work before it could go into List::Util

        Agreed, but I did have a go at making topN() accept a comparator function, but the cost off calling back into Perl code for each comparison exacts a heavy performance penalty--to the point where it looses all benefit. Though it is possible that my fledgling XS/internals skills bore some of that cost.

        The only way I see to avoid that, would be to have four separate routines, (say) highNn(); highNs(); lowNn(); lowNs();.

        I think this is the route taken by sort, albeit that the "naming convention" (for backwards compatibiity?) is sort{$a<=>$b}(); thru sort{$bcmp$a}();

        Since the basic algorithm needs only a size-N array, there ought to be some way to wrap this as a package that takes an arbitrary input set of unlimited size,...

        Personally, I'm not very keen on that type of pseudo-OO wrapper around what is essentially a functional algorithm. Would the wrapper code accumulate a copy of everything passed or discard values that didn't make the cut?

        If the former, unnecessary duplication of storage results. If the latter, I think I would prefer not to do a function call to reject individual values that don't make the cut. I'd code the accumlator scenario something like:

        my @topN = topN( $n, \@data ); @topN = topN( $n, [ @topN, grep{ chomp; $_ > $topN[ -1 ] } <> ] );

        (BTW. I really wish that chomp returned the chomped value. It would make some constructs so much easier.)

        or maybe

        my @topN( $n, \@data ); my @batch; while( not eof ARGV ) { for( my $i=0; $_ = <> and $i < $BATCHSIZE; $i++ ) { chomp; push @batch, $_; } @topN( $n, [ @topN, @batch ] ); }

        Basically, I prefer to retain programmer choice in stuff like this--but there would be nothing to stop someone adding a wrapper around it, though four (type/direction) versions would probably be needed again.


        Examine what is said, not who speaks.
        Silence betokens consent.
        Love the truth but pardon error.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://427569]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (3)
As of 2024-04-25 17:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found