Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: Re: Advanced Sorting - GRT - Guttman Rosler Transform

by demerphq (Chancellor)
on Aug 10, 2003 at 13:07 UTC ( [id://282607]=note: print w/replies, xml ) Need Help??


in reply to Re: Advanced Sorting - GRT - Guttman Rosler Transform
in thread Advanced Sorting - GRT - Guttman Rosler Transform

Your benchmark is utterly bogus.

First, you use the q{} form of benchmark and yet you try to access a lexical variable that will not be in scope when the code is evaled. So your code benchmarks sorting an empty list! (The counts in the millions per second range should have tipped you off that something was wrong.)

Next the 'st' implementation has a bug in it so it doesnt actually sort correctly $a->[0] <=> $a->[0] should read $a->[0] <=> $b->[0], and the 'bare' implementation doesnt do the full sort either as it doesnt sort based on the word as well as the 'E' count.

Once the benchmark is fixed up to actually test similar things against similar things, and to use a better non empty data set we see that the GRT crushes the competition, and that the ST greatly outperforms the uncached code.

Benchmark: running bare, grt, st, each for at least 3 CPU seconds... bare: 3 wclk secs ( 3.02 usr + 0.00 sys = 3.02 CPU) @ 4.63/s + (n=14) grt: 4 wclk secs ( 3.15 usr + 0.00 sys = 3.15 CPU) @ 15.54/s + (n=49) st: 3 wclk secs ( 3.13 usr + 0.00 sys = 3.13 CPU) @ 10.21/s + (n=32) Rate bare st grt bare 4.63/s -- -55% -70% st 10.2/s 120% -- -34% grt 15.5/s 235% 52% --

And using the exact same data set as you did we see that the GRT still wins, even if 'bare' outperforms 'st'.

Rate st bare grt st 18009/s -- -21% -49% bare 22661/s 26% -- -36% grt 35616/s 98% 57% --
#!/usr/bin/perl use strict; use warnings; use Text::Wrap qw(wrap); use Benchmark qw(cmpthese); my @words = qw( EMBOSOM EMBOWED EMBOWEL EMBOWER EMBRACE EMBROIL EMBROWN EMBRUED EMBRUE +S EMBRUTE EMBRYON EMBRYOS EMENDED EMENDER EMERALD EMERGED EMERGES EMERIE +S EMERITA EMERITI EMERODS EMEROID EMERSED EMETICS EMETINE EMETINS EMEUTE +S EMIGRES EMINENT EMIRATE EMITTED EMITTER EMODINS EMOTERS EMOTING EMOTIO +N EMOTIVE EMPALED EMPALER EMPALES EMPANEL EXFOLIATIONS EXHAUSTIVELY EXHAUSTIVITY EXHIBITIONER EXHILARATING EXHILARATION EXHILARATIVE EXHORTATIONS EXIGUOUSNESS EXOBIOLOGIES EXOBIOLOGIST EXONERATIONS EXONUCLEASES EXOPEPTIDASE EXOPHTHALMIC ); my $test={ st => sub { my @sorted = map { $_->[1] } sort { $a->[0] <=> $b->[0] || $a->[1] cmp $b->[1] } map { [ tr/eE/eE/, $_ ] } @words; }, grt => sub { my @sorted = map { substr($_, 4) } sort map { pack("LA*", tr/eE/eE/, $_) } @words; }, bare => sub { my @sorted = sort { ($a =~ tr/eE/eE/) <=> ($b =~ tr/eE/eE/) || $a cmp $b } @words; } }; foreach my $t (keys %$test) { print "Test '$t'\n",wrap("\t","\t",join(", ",$test->{$t}->())),"\n +"; } @words=map { ( $_ x 100 ) x 100 } @words; cmpthese(-3,$test);
This is perl, v5.6.1 built for MSWin32-x86-multi-thread

The moral of the story is a common one: Always test your benchmarks. Always double check that what you are comparing is equivelent. Always be careful with selecting the data set you benchmark against. If you benchmark a code snippet and not a subref then you should excercise even more caution. (Generally I dont think its a good idea actually.)


---
demerphq

<Elian> And I do take a kind of perverse pleasure in having an OO assembly language...

Replies are listed 'Best First'.
Re: Re: Re: Advanced Sorting - GRT - Guttman Rosler Transform
by Juerd (Abbot) on Aug 10, 2003 at 14:11 UTC

    So your code benchmarks sorting an empty list!

    Oops.

    Next the 'st' implementation has a bug in it so it doesnt actually sort correctly $a->[0] <=> $a->[0] should read $a->[0] <=> $b->[0]

    Copied that from the root node. Wrong indeed.

    grt 15.5/s 235% 52% --

    That's very comforting. Thanks.

    Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

Re: Re: Re: Advanced Sorting - GRT - Guttman Rosler Transform
by Anonymous Monk on Nov 17, 2003 at 02:49 UTC
    Be careful which Perl version you are using. I ran the (corrected) benchmark under perl v5.6.1 and got this:
    Rate st grt bare bare 4.55/s -- -51% -66% st 9.32/s 105% -- -31% grt 13.5/s 196% 44% --
    But under perl v5.8.0 I got this:
    Rate st grt bare st 9.43/s -- -15% -37% grt 11.1/s 18% -- -26% bare 15.0/s 59% 35% --
    which seems to negate the use of GRT or ST sorting.

      which seems to negate the use of GRT or ST sorting.

      Seems being the key word here. :-)

      Its true that signifigant work was done to Perls sorting code between 5.6.1 and 5.8.0. Its true that many special cases have now been optmized. In fact it turns out that some of the optimisations that have occured in this period would cause the benchmark I originally posted show bad results for ST and GRT. The switch from quicksort to mergesort means that on average less comparisons are performed per sort and as the "bare" variant does the tr/// per comparison this has a adirect effect on the results. It also appears that optimisations have occured that make the ST _much_ more competitive with the GRT (GRT still wins in the benchmarks I have done however.) Also it appears optimisations have been done on tr/// in count mode, making it a most unsuitable benchmark candidate. Even worse (for the benchmark that is, everybody else gets a win :-) is that mergesort behaves particularly well on almost ordered data. As my test set is relatively ordered (due to the repetive elements) this has a particularly signifigant effect. Simply shuffling the records before the sort (after the replication) causes a dramatic change in the performance.

      What all of this means is not that the ST and GRT are "negated" but rather that the circumstances under which they are useful is reduced. This is a good thing. However, the fact still remains that given a relatively expensive comparison function the ST and GRT still win, and the GRT still beats the ST. This can be clearly seen by replacing the calls to tr/// with a subroutine that does the same thing.

      Yes, perl has gotten "better" at sorting. No, the ST and GRT are not redundant now. However given the test results I've seen so far I would probably not bother with the GRT. The ST would appear to be nearly the same performance, and a lot easier to handle.When you play around with things, using different comparsion functions, different data sets and distributions, etc you see that the GRT and ST still beat the "bare" sort. And thats the point here. If the comparsion function is expensive, precalculate it. Optimisations in perl may make a given example not behave as expected (which just reinforces the point that benchmarking should happen after the code is complete and not before) but overall, reducing the cost of the comparison still wins you some time. Thus IMO the ST and its derivatives (GRT) will be useful tools for a long time to come.

      Cheers,


      ---
      demerphq

        First they ignore you, then they laugh at you, then they fight you, then you win.
        -- Gandhi


Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://282607]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (6)
As of 2024-03-28 23:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found