comment on

~~Curious.~~ On my machine (Debian Etch, Perl 5.8.8) the naive run ~~wins every time~~ looses in all cases. Here are the results:

             Rate naive splitwise inet_orc split_int split_pack inet_g
+rt inet_grt2
naive      12.9/s    --      -79%     -82%      -86%       -86%     -9
+3%      -93%
splitwise  62.1/s  382%        --     -11%      -32%       -33%     -6
+5%      -65%
inet_orc   69.9/s  443%       13%       --      -23%       -24%     -6
+1%      -61%
split_int  91.2/s  608%       47%      30%        --        -1%     -4
+9%      -49%
split_pack 92.2/s  617%       49%      32%        1%         --     -4
+8%      -48%
inet_grt    178/s 1284%      187%     155%       95%        93%       
+--       -0%
inet_grt2   179/s 1287%      188%     155%       96%        94%       
+0%        --
[download]

Note:The version of the test I used includes split_pack, but not Sort::Key::IPv4 or Sort::Key::Radix and fixes a small typo in naive_sort (my @b = split /\./, $a; should be my @b = split /\./, $b;)

Still, this makes me wonder about the Schwartian transform. The argument for it is that during a naive sort we are likely to need to extract the sort key several times. The cumulative time for the repeat derivation of the sort key is more than the extra work of reconstructing the original value from the sort key and the extra copying and memory allocation needed by the three step process. Hence the Schwartzian transform is faster. But is this always true?

It would seem to depend on four factors:

the time needed by the rule for extracting the sort key from the original value
the time needed by the rule for reconstructing the original value from the sort key
how badly out of order the initial array is.
the size of your data set

Perl sort uses quicksort. If the array is completely ordered,then each sort key (other than the first and last) will be derived twice - once to compare N-1 to N and once to compare N to N+1. In the average case 2N becomes 2N*logN and only in the worst case it is 2*N^2.

Putting this all together, the Schwartzian transform is only faster if N*(extract_sortkey + reconstruct_value + 2*memcopy) < extract_sortKey*N*X where X is 2 in the best case, log N on average and N in the worst case. This simplifies to reconstruct_value + 2*memcopy < extract_sortkey*(X - 1). Thus we have:

Best case: reconstruct_key+2*memcopy < extract_key
On average: reconstruct_key+2*memcopy < extract_key*log N
Worst case: reconstruct_key+2*memcopy < extract_key*N

Note that the Schwartian side is a constant and the naive side scales with respect to N. Surely a process that scales by N will be slower than a constant process? Not necessarily. The problem here is that we are comparing apples and oranges. On one side of the equation is "reconstruct_key + 2*memcopy" and on the other is "extract key". These are very different operations and we may be comparing very slow operations to very fast operations. For instance, if the process of reconstructing the original value is 10X slower than extracting the key, we would need to be sorting a fairly large array of N=1024=2^10 values before the Schwartzian transfom would give us any speed advantage.

The moral of this story is that there is no automatically right way to sort - it all depends on the size of your data set, the likelihood that it will be badly ordered, and the difference between the time involved in extracting sort keys and the time needed to reconstruct original values.

Best, beth

Update:Fixed who "wins" - misunderstood output! Thanks to bv, ikegami below.

In reply to Re: RFC: Sorting IPv4 addresses by ELISHEVA
in thread RFC: Sorting IPv4 addresses by bv

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.