Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot

Re: Helping Beginners (continued)

by japhy (Canon)
on Oct 30, 2001 at 07:54 UTC ( [id://122028]=note: print w/replies, xml ) Need Help??

in reply to Helping Beginners

Ok. Here's how I present any ST-involving response. Notice how explicit I make the functions to begin with, and how I then change the code to a more idiomatic style. I'll take the given request, and for example's sake, I'll assume the data was:
@data = ( 'buy 1/23/01 100 12.625 50 25.25', 'buy 09/1/01 100 12.625 50 25.25', 'buy 10/23/01 100 12.625 50 25.25', 'buy 10/25/01 100 12.625 50 25.25', );
The first thing you need to do to be able to sort the data is isolate the field you want to sort by extracting it from your data:
sub get_date { my $string = shift; my $date = (split ' ', $string)[1]; # second field return $date;
Now we can extract the dates from each line of data:
for $line (@data) { push @dates, get_date($line); }
We now have two parallel arrays, @data which holds the original data, and @dates which holds the date for each line, respectively. Now we need to sort @dates to get it in the correct order. The problem is that sorting ONE array will not help -- both arrays need to be sorted. We could sort the indices of one array, but we'll use a different approach, one that involves references. Instead of keeping track of the dates only, let's also include the other information as well, as elements of an anonymous array:
for $line (@data) { push @dates, [ get_date($line), $line ]; }
For each element $e in the @dates array, $e->[0] is the date, and $e->[1] is the original line. Now we can move on to the actual problem of sorting the array. We need to sort the dates. What's the best format to sort dates in? Seconds? Well, maybe. But we don't have or need that granularity -- we have year, month, and day. Instead of using the form "DD/MM/YY", let's use the form "YYMMDD". This will be of great use to us, because dates in the latter form can be sorted as regular numbers. So we need to change our get_date() function a bit, to extract the date and fix it:
sub get_date { my $string = shift; my $date = (split ' ', $string)[1]; my ($d, $m, $y) = split '/', $date; return sprintf "%02d%02d%02d", $y, $m, $d; }
Now our function returns "YYMMDD", with each number zero-padded (that's what the "%02d" format means). Now we can sort the dates natively:
@dates = sort { $a->[0] <=> $b->[0] } @dates;
Before you panic, remember that the elements of @dates are array references, so $a->[0] is accessing the date portion of the element. If you've never used sort() before, $a and $b are the two elements being compared, and the <=> operator returns a value of -1, 0, or 1, depending on the relationship (less than, equal to, or greater than) the two operands. Now, our last job is to extract the original data from the array. For this, we will use map(), which acts like a for-loop on a list.
@data = map $_->[1], @dates;
This extracts the second element from each array reference, and stores them in @data. Now we have working code. But let's make it more idiomatic. First, notice that we have three distinct stages in our code:
  1. date extraction
  2. sorting
  3. data restoration
We do these one after the other, so we can try to combine them into one larger process:
@data = restore( sort { $a->[0] <=> $b->[0] } extract(@data) );
Notice how the stages now read from the bottom up? This is the standard appearance of Lisp-like code (and this code is indeed Lisp-like). Instead of creating two more functions, restore() and extract(), let's see what we can do with the existing function get_date(), and Perl's built-in map() function:
# extract(@data) # becomes map [ get_date($_), $_ ], @data
Notice how the extraction (which involves the creation of the array of array references) is really just an iteration of the get_date() function on each element of the array? Then, for restore(), we simply do:
# restore(...) # becomes map $_->[1], ...
Our code now looks like this:
@data = map $_->[1], sort { $a->[0] <=> $b->[0] } map [ get_date($_), $_ ], @data;
Lisp-ish code, indeed! (Who ever said Perl wasn't functional?) What you've just witnessed the creation of is called a Schwartzian Transform (find its history elsewhere on the internet). It takes the form:
@data = map { restore($_) } sort { ... } map { extract($_) } @data;
which is (more or less) what our code now looks like. (Insert documentation references and what-not here.)

Jeff[japhy]Pinyan: Perl, regex, and perl hacker.
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

Replies are listed 'Best First'.
(Ovid) Re(2): Helping Beginners (continued)
by Ovid (Cardinal) on Oct 30, 2001 at 21:21 UTC

    japhy, that was great. Not only was it a beatiful break down of how you accomplished your end goal, but I now understand why my version was incredibly sloppy next to yours. I think that's a mistake I won't make again :)


    Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

Re(2): Helping Beginners (continued)
by FoxtrotUniform (Prior) on Nov 30, 2001 at 23:02 UTC

    Great explanation, japhy! Between this node and actually having a good reason to write a Schwartzian Transform, I now "grasp" what it means. I haven't felt this clever since I worked through Duff's device. Domo arigato!


Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://122028]
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (7)
As of 2024-04-13 09:26 GMT
Find Nodes?
    Voting Booth?

    No recent polls found