Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re^5: Benchmark, -s versus schwartzian

by ambrus (Abbot)
on Aug 23, 2004 at 19:20 UTC ( [id://385188]=note: print w/replies, xml ) Need Help??


in reply to •Re^4: Benchmark, -s versus schwartzian
in thread Benchmark, -s versus schwartzian

Why? You can always do the GRT by using only the key (by which you sort) and an index in the string. I think of something like:

@files = glob("/bin/*"); @sorted = @files[ map { /(\d+)$/g } sort map { sprintf "%012d %d", -s $files[$_], $_ } 0..@files-1]; print "@sorted$/";

This is of course efficent only because of Perl optimizing the default sort subroutine. Someone who understands perl source well might be able to change sort so that it would optimize sort methods like {$h[$a]<=>$h[$b]} which would give us a possibility to sort even faster like

my @h = map -s, @files; @results = @files[sort {$h[$a]<=>$h[$b]} 0..@h-1];

Replies are listed 'Best First'.
•Re^6: Benchmark, -s versus schwartzian
by merlyn (Sage) on Aug 23, 2004 at 19:28 UTC
    You can always do the GRT by using only the key...
    Uh, that's sometimes the hard part. Suppose you wanted to sort by a floating point number (say, age in days), then descending by a string of varying length that might have a NUL in it, and then descending by another integer (byte size). Quick, construct the GRT for that. Not easy, huh. Trivial in the ST.

    Yes, you can always get a single GRT string for a multilevel sort. But sometimes, as I said, it takes a frickin' genius.

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

      You know, all this fancy fast sorting would be much, much easier if perl's sort function compared references to arrays the way python's sort function does: that is, lexographically. (aka "dictionary order") Then, the standard Schwartzian transform could omit the { $a->[0] cmp $b->[0] or $a->[1] cmp $b->[1] } bit, and the performance differences between the ST and GRT would almost vanish. This would allow us to stop trying to construct a GRT for squeezing that extra 2% out of a long sort time and get on with our lives.

      Actually, is there a good reason why the builtin sort compare doesn't compare references to arrays in this fashion?

      -- @/=map{[/./g]}qw/.h_nJ Xapou cets krht ele_ r_ra/; map{y/X_/\n /;print}map{pop@$_}@/for@/

        s/2%/50%/ (roughly, depending). And it isn't just the comparison routine that is the problem. Constructing a huge number of tiny arrays takes some time (and memory).

        I think fast, flexible, stable sort can be rolled into a module that makes the key construction easy and natural where the module overhead is low and outside of the sorting and so using the module would give performance (in speed and memory use) on par with a very efficient hand-rolled solution and much faster than any general-purpose Perl sorting modules I've looked at.

        Having sort default to sorting array references as you've specified certainly makes sense. The boring answer is that backward compatibility prevents it from happening in Perl5. I won't pretend to remember how this type of operation is likely to behave in Perl6.

        - tye        

      If sort {$a<=>$b} is indeed optimized (it seems so), you can just use it for a floating-point key, you just have to 'no warnings "numeric"; and put the float in the first place.

      You are, however, right in that if the key is more complicated, this method is not applicable. (You can however gain with the ST only if calculating the key is much slower than comparing them.)

Re^6: Benchmark, -s versus schwartzian
by Aristotle (Chancellor) on Aug 23, 2004 at 20:43 UTC

    I rarely find the GRT worth the bother. It is ridiculously hard to maintain GRT code, and it's not unlikely for the transformation step in a GRT to be so computionally complex that the order-by-sorted-indices method as per your second snippet beats it anyway — as seen at Re: To use a module...or not.. Order-by-sorted-indices is generally straightforward, easy to maintain, and easy to get consistent results from; yet very fast and memory efficient. I doubt I'll ever need anything else (though it's not completely impossible, of course).

    I appreciate the ST as an excercise in functional thinking, but it's been a long time since I last used it in practice.

    Makeshifts last the longest.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://385188]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (2)
As of 2024-04-24 13:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found