When I see such a huge disparity in benchmarks, the first thing I think is "oops, implemented one of them wrong".
I think the problem here is that myocom's sort is the only one where the return value from the subroutine is also the return value from sort, which means that when that function is called in a scalar context, sort ends up in a scalar context which causes it to just return undef and do no work.
I added @_= to the subs passed to Benchmark and got these results:
Which confirms my suspicions that arhuman's would scale the best and myocom's the worst.Comparing for 10 elements Rate myo st ah mc tye myo 2765/s -- -14% -15% -33% -33% st 3225/s 17% -- -1% -22% -22% ah 3264/s 18% 1% -- -21% -21% mc 4129/s 49% 28% 27% -- -0% tye 4147/s 50% 29% 27% 0% -- Comparing for 100 elements Rate myo st ah mc tye myo 148/s -- -49% -58% -61% -65% st 287/s 94% -- -19% -23% -31% ah 353/s 139% 23% -- -6% -15% mc 375/s 154% 31% 6% -- -10% tye 418/s 183% 46% 18% 11% -- Comparing for 1000 elements Rate myo mc st tye ah myo 8.45/s -- -49% -55% -70% -74% mc 16.4/s 94% -- -13% -41% -49% st 19.0/s 124% 15% -- -32% -41% tye 27.8/s 229% 69% 47% -- -14% ah 32.2/s 281% 96% 70% 16% -- Comparing for 10000 elements Rate mc myo st tye ah mc 0.332/s -- -48% -77% -82% -87% myo 0.632/s 91% -- -56% -66% -76% st 1.43/s 331% 126% -- -24% -45% tye 1.88/s 466% 197% 31% -- -28% ah 2.62/s 690% 314% 83% 39% --
How well the sorts scale depends mostly on how much work can be moved out of the comparison function (since you do O(n*log(n)) comparison but only O(n) translations). arhuman's does the bare minimum in the comparison routine. Mine does array lookups. The Schwartzian Transform dereferences array references. myocom's does real computations. So that is the order I expect them to scale in.
MeowChow's is interesting. You'd think it would scale as well as arhuman's. I think it probably does but that we'd have to get to much larger lists (and make sure we had tons of RAM so swapping doesn't complete skew the results). Why? Because for each key of the hash that you insert or look up, Perl has to traverse the entire length of the string in order to compute the "hash". But comparing only requires you to traverse the string until you find a difference. So most compares only go one or two characters in.
So MeowChow's just has a big constant, K, in the O(K*N+N*log(N)) which requires a really large N for it to "win".
Also note that I find such "predictions" on the performance of different Perl code are very often wrong. There are a lot of moving parts and surprising costs, so YMMV a lot.
Update: Sorry for including the wrong version, MeowChow. I misread your comment, thinking that only the second version had problems with duplicate keys. BTW, you could try appending the (packed) record number to the end of each key.
- tye (but my friends call me "Tye")In reply to (tye)Re3: Substring Sort
by tye
in thread Substring Sort
by Anonymous Monk
For: | Use: | ||
& | & | ||
< | < | ||
> | > | ||
[ | [ | ||
] | ] |