comment on

If I remember correctly, the name was coined by Joseph Hall, who co-wrote Effective Perl Programming with Randal Schwartz. As has been mentioned here, the primary reason for the transform is efficiency. Computing the sort term first and eliminating assignments to temporary variables via the list processing features of Perl turns out to yield substantial savings. Here's an example from something I just worked on. I needed to write some code to group the sale prices of recently sold homes by $100k-199k, $200k-299k, etc. and then sort them. To group the prices, instead of using a range or if ($x->{SP} >= 100 and $x->{SP} < 200) {...} elseif ($x >= 200 and $x < 300) (...) etc, I just computed int($home->{sp}/100)*100. Now 128 and 192 become 100, 202 and 246 become 200, etc. There was more to it, but this is an example.

Now, on the face of it, the sorting would look something like

@sorted = 
    sort { int($a->{SP}/100)*100 <=> int($b->{SP}/100)*100 }
    @unsorted;
[download]

The problem is that when you sort 100 items, the number of comparisons made is on the order of N**2 (if I remember correctly). Thus, sorting 1000 items requires a million comparisons, which requires a million instances of dereferencing, doing some math, lopping off the decimals, etc. With more complicated sort terms, it can get quite hairy.

So for efficiency, the ST creates an array of two-element lists of the form

( [$sort_term, $ref_to_orig-data], [$sort_term, $ref_to_orig-data], etc)

Then you just sort the whole thing once. The trick is to do it without temporary variables. This is where map can be so useful. Read it from the bottom up.

@sorted_refs =
    map { $_->[1] }
    sort { $a->[0] <=> $b->[0] }
    map { [ int($_->{SP}/100)*100, $_] }
    @unsorted_refs;
[download]

You can also do this with additional terms, such as sorting within groupings.

@sorted_refs =
    map { $_->[2] }
    sort { $a->[0] <=> $b->[0] || $a->[1] <=> $b->[1] }
    map { [ int($_->{SP}/100)*100, DateToTmStr($_->{SaleDate}), $_] }
    @unsorted_refs;
[download]

I read an article a few years ago which takes this concept further and recommends, for certain data, concatenating together the search term, a connector of some kind, and the original data as a single string. By eliminating the dereferencing, you can save quite a bit of time; though this only works if you have data that can be serialized without adding even more work than you save.

In reply to Re: What is "Schwartzian Transform" by furry_marmot
in thread What is "Schwarzian Transform" (aka Schwartzian) by GrandFather

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.