You asked: If somebody can walk me through the above lines.....
You seem interested and I'm gonna give you some hints, but first I think you would be well advised to learn more about basic sorting. The following code is a "simple" sort and will be slower than merlyn's code as the number of lines get larger but it is functionally the same. I'll walk you through this and then the more advanced code will make more sense.
First we want to get all the data lines. This can be done like above or with a while{} loop and pushing input lines onto the @inputlines array.#!/usr/bin/perl -w use strict; my $column_number =3; my $prev = ""; my @input_lines = <DATA>; #slurps DATA in all at once @input_lines = sort {my ($a_col) = (split(/\s+/,$a))[$column_number]; my ($b_col) = (split(/\s+/,$b))[$column_number]; $a_col cmp $b_col }@input_lines; foreach (@input_lines) { print unless $_ eq $prev; $prev = $_; } __DATA__ FT9mWp d4fgMB gvZRJU XRRu0N X 4ewejk pFnjd4 ie0hZF pPipQJ A 4ewejk 4sqprx ie0hZF cqtexi 3 Fo1OKn qhZPvb qWZPrt ruBObs 2
Now we come to the sort. An important thing is to understand that what goes into a sort{} is what comes out! We can sort all kinds of complex structures. But what goes in, is what comes out is true. Here I assigned the output back to the input, which is completely legal and common.
Sort works on some "magic variables", $a and $b. The sort function compares pairs of "things", whatever they are that are coming in. And $a and $b are assigned to those "thing pairs". Here we have just lines of text input. But this could be something like rows of an Array of rows (an ArrayofArray) or even other "things".
@inputlines is what comes in (and is what is going to come out of the sort), but we have a problem because we only want to sort the input lines based on a subset of the line, the thing in the specified column. To get the right column, we use split() which generates a list of tokens that were separated by whitespace. Now only one of these columns is relevant. So this (split(/\s+/,$a))[$column_number] stuff is a "slice" and says that I only want the one of the things from the split() operation - the stuff in a particular $column_number. That is done and assigned twice once for the $a input and once for $b input. Then we do a "cmp" which is an alpha comparison yielding a "less than, equal, greater than" value. The sort{} function uses this decision to order the inputlines. Remember what I said before, what goes into a sort comes out of a sort, so this decision affects the inputlines, not just this particular column!!
Then results are printed. Lines that are the same as previous lines are suppressed.
Now as you see if you're following me so far, every time I want to compare 2 input lines, I have to go through this split() and then slice stuff. What merlyn's code does is precompute these comparison values once and then use these precomputed results in the sort. The sort doesn't do any fewer compares, but it spends less time figuring out exactly what to compare! The "cost" is building an intermediate data structure with those values (which takes time and memory).
The Schwartzian Transform creates a new ArrayofArray, with each "row" containing the original thing and then the things that are to be used in the sort comparison function. That array is the "thing" that gets inputted into the sort. After the sort is done, we don't need these precomputed values any more, so the original input is re-constructed, ie the precomputed things are thrown away.
Note that these map{},sort{}grep,map{} things process lists. Because the "line" could get way long and it is easier to read, they are usually "stacked", read this stack from the bottom up, actually right to left.
Sorry post got long, but I hope this helps you understand at least the general idea of what is going on. I would recommend practicing more along the lines of my example before launching off into more complex techniques.
In reply to Re^3: Emulating 'sort' command through a Perl
by Marshall
in thread Emulating 'sort' command through a Perl
by paragkalra
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |