Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re: Better mousetrap (getting top N values from list X)

by demerphq (Chancellor)
on Feb 03, 2005 at 07:32 UTC ( [id://427544]=note: print w/replies, xml ) Need Help??


in reply to Better mousetrap (getting top N values from list X)

Just thought i should mention that I recall discussion on P5P about optimizing sort when only N values will be used of the return. Assuming that work actually has been done I should think that

my ($x,$y,$z)=sort @foo;

Would be the most efficient way to do this. Problem is I really dont remember what the result of that thread was. :-(

---
demerphq

Replies are listed 'Best First'.
Re^2: Better mousetrap (getting top N values from list X)
by Anonymous Monk on Feb 03, 2005 at 10:12 UTC
    Yeah, but if you want to extract the top 26, do you really want to write:
    my @top26 = my($a, $b, $c, $d, $e, $f, $g, $h, $i, $j, $k, $l, $m, $n, + $o, $p, $q, $r, $s, $t, $u, $v, $w, $x, $y, $z) = sort @foo
    or would you prefer:
    my @top26 = top (26, @foo);
    I'm willing to use up to three variables - but if I only want to extract the top3, I could easily write a single pass algorithm, keeping track of the top3.

    And you'd need to write tricky evals if you don't know which top N to take during compile time, but only at run time (for instance, because N is user input).

      Having tracked down (part of) the p5p thread demerphq mentioned, the suggestion was that you would also be able to employ the short-cicuited sort by using a slice:

      @most[ 0 .. 6 ] = sort @ary; # only sort 7 entries

      I'm not familiar with how the RHS hints are derived, but that would seem to address both your concerns.

      That said, I think topNbs() as posted by tall_man could be added to List::Util quite easily and would probably be easier to get accepted because of the lower risk.

      Then again, it doesn't really use a List, so maybe it is time for an Array::Util package.


      Examine what is said, not who speaks.
      Silence betokens consent.
      Love the truth but pardon error.
        The code would need more work before it could go into List::Util, since it only works for integer values, positive only, and only for the N highest (not lowest, or any arbitrary sorting function). It's a good idea, though.

        Since the basic algorithm needs only a size-N array, there ought to be some way to wrap this as a package that takes an arbitrary input set of unlimited size, not just an array ref as we have here.

        my $acc = new NGreatest($n); $acc->insert(\@arr); while <> { $acc->insert(chomp $_); } my @top = $acc->result();
Re^2: Better mousetrap (getting top N values from list X)
by BrowserUk (Patriarch) on Feb 03, 2005 at 09:18 UTC

    I don't suppose you recall how they were going to determine how many values to produce?

    Is that information --ie. the number of values required on the right-hand side of a list assignment--generally available to XS code, or was the intention to make a special case for the construct?


    Examine what is said, not who speaks.
    Silence betokens consent.
    Love the truth but pardon error.

      IIRC the idea was to apply much the same type of logic as occurs with split. Ie the logic that implicitly sets the third argument of the split to be N+1 where N is the number of scalar slots on the LHS of the assignment. I hazzily recall discussion on whether using an explicit slice would also do the same. I think if you trawl the p5p archives for sort and optimize youll find it. I think the basic idea was that

      my ($x,$y,$z)=sort @foo; my @top=(sort @foo)[1..$n];

      would be special cased somehow.

      As for your question about XS, I really have no idea right now. Sorry.

      ---
      demerphq

        Note that your first construct works for split but your second would give no hints to split.

        As to how this is done, I'd look at Want if I were curious.

        - tye        

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://427544]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (5)
As of 2024-04-19 02:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found