Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: What is "Schwartzian Transform"

by furry_marmot (Pilgrim)
on Jul 21, 2005 at 17:16 UTC ( [id://476918]=note: print w/replies, xml ) Need Help??


in reply to What is "Schwarzian Transform" (aka Schwartzian)

If I remember correctly, the name was coined by Joseph Hall, who co-wrote Effective Perl Programming with Randal Schwartz. As has been mentioned here, the primary reason for the transform is efficiency. Computing the sort term first and eliminating assignments to temporary variables via the list processing features of Perl turns out to yield substantial savings. Here's an example from something I just worked on. I needed to write some code to group the sale prices of recently sold homes by $100k-199k, $200k-299k, etc. and then sort them. To group the prices, instead of using a range or if ($x->{SP} >= 100 and $x->{SP} < 200) {...} elseif ($x >= 200 and $x < 300) (...) etc, I just computed int($home->{sp}/100)*100. Now 128 and 192 become 100, 202 and 246 become 200, etc. There was more to it, but this is an example.

Now, on the face of it, the sorting would look something like

@sorted = sort { int($a->{SP}/100)*100 <=> int($b->{SP}/100)*100 } @unsorted;
The problem is that when you sort 100 items, the number of comparisons made is on the order of N**2 (if I remember correctly). Thus, sorting 1000 items requires a million comparisons, which requires a million instances of dereferencing, doing some math, lopping off the decimals, etc. With more complicated sort terms, it can get quite hairy.

So for efficiency, the ST creates an array of two-element lists of the form

( [$sort_term, $ref_to_orig-data], [$sort_term, $ref_to_orig-data], etc)

Then you just sort the whole thing once. The trick is to do it without temporary variables. This is where map can be so useful. Read it from the bottom up.

@sorted_refs = map { $_->[1] } sort { $a->[0] <=> $b->[0] } map { [ int($_->{SP}/100)*100, $_] } @unsorted_refs;

You can also do this with additional terms, such as sorting within groupings.

@sorted_refs = map { $_->[2] } sort { $a->[0] <=> $b->[0] || $a->[1] <=> $b->[1] } map { [ int($_->{SP}/100)*100, DateToTmStr($_->{SaleDate}), $_] } @unsorted_refs;

I read an article a few years ago which takes this concept further and recommends, for certain data, concatenating together the search term, a connector of some kind, and the original data as a single string. By eliminating the dereferencing, you can save quite a bit of time; though this only works if you have data that can be serialized without adding even more work than you save.

Replies are listed 'Best First'.
Re^2: What is "Schwartzian Transform"
by merlyn (Sage) on Jul 21, 2005 at 17:25 UTC
      Oops! Apologies to Mr. Christiansen.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://476918]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (3)
As of 2024-04-16 22:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found