When I see such a huge disparity in benchmarks, the first thing I think is "oops, implemented one of them wrong".

I think the problem here is that myocom's sort is the only one where the return value from the subroutine is also the return value from sort, which means that when that function is called in a scalar context, sort ends up in a scalar context which causes it to just return undef and do no work.

I added @_= to the subs passed to Benchmark and got these results:

Comparing for 10 elements Rate myo st ah mc tye myo 2765/s -- -14% -15% -33% -33% st 3225/s 17% -- -1% -22% -22% ah 3264/s 18% 1% -- -21% -21% mc 4129/s 49% 28% 27% -- -0% tye 4147/s 50% 29% 27% 0% -- Comparing for 100 elements Rate myo st ah mc tye myo 148/s -- -49% -58% -61% -65% st 287/s 94% -- -19% -23% -31% ah 353/s 139% 23% -- -6% -15% mc 375/s 154% 31% 6% -- -10% tye 418/s 183% 46% 18% 11% -- Comparing for 1000 elements Rate myo mc st tye ah myo 8.45/s -- -49% -55% -70% -74% mc 16.4/s 94% -- -13% -41% -49% st 19.0/s 124% 15% -- -32% -41% tye 27.8/s 229% 69% 47% -- -14% ah 32.2/s 281% 96% 70% 16% -- Comparing for 10000 elements Rate mc myo st tye ah mc 0.332/s -- -48% -77% -82% -87% myo 0.632/s 91% -- -56% -66% -76% st 1.43/s 331% 126% -- -24% -45% tye 1.88/s 466% 197% 31% -- -28% ah 2.62/s 690% 314% 83% 39% --
Which confirms my suspicions that arhuman's would scale the best and myocom's the worst.

How well the sorts scale depends mostly on how much work can be moved out of the comparison function (since you do O(n*log(n)) comparison but only O(n) translations). arhuman's does the bare minimum in the comparison routine. Mine does array lookups. The Schwartzian Transform dereferences array references. myocom's does real computations. So that is the order I expect them to scale in.

MeowChow's is interesting. You'd think it would scale as well as arhuman's. I think it probably does but that we'd have to get to much larger lists (and make sure we had tons of RAM so swapping doesn't complete skew the results). Why? Because for each key of the hash that you insert or look up, Perl has to traverse the entire length of the string in order to compute the "hash". But comparing only requires you to traverse the string until you find a difference. So most compares only go one or two characters in.

So MeowChow's just has a big constant, K, in the O(K*N+N*log(N)) which requires a really large N for it to "win".

Also note that I find such "predictions" on the performance of different Perl code are very often wrong. There are a lot of moving parts and surprising costs, so YMMV a lot.

Update: Sorry for including the wrong version, MeowChow. I misread your comment, thinking that only the second version had problems with duplicate keys. BTW, you could try appending the (packed) record number to the end of each key.

        - tye (but my friends call me "Tye")

In reply to (tye)Re3: Substring Sort by tye
in thread Substring Sort by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.