Sorting help

raj123 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Sorting help by shmem (Chancellor) on Jun 20, 2009 at 08:07 UTC
`open my $fh, '<', $file; my @ary = <$fh>; print # print the list consisting of map { $_->[1] } # all second elements sort { $a->[0] <=> $b->[0] } # of a sorted list of anon arrays map { # constructed via my ($d) = />(\d+)/; # extracting the numbers to be sorted [ $d, $_ ] # as an anon array of number and line } @ary; # for all lines of the file` [download] See A brief tutorial on Perl's native sorting facilities.. See also: open, print, map, sort. For anonymous arrays see References quick reference, the perldata, perlref and perlreftut manual pages.	[reply] [d/l]
Re: Sorting help by Marshall (Canon) on Jun 20, 2009 at 15:18 UTC
The reply by shmem is right, but perhaps a bit advanced for what you need. I would suggest mastering basic sorting before moving into advanced techniques. I'll back up and explain sort a bit for you... If we just had: @data = sort @data, this will be just a straight line by line alphabetic sort of the @data list. You don't want that and need a special order. Sort allow you to specify subroutine that does a comparison function, returning either 1(a>b), 0(a=b), -1(a<b) (presumably this is different than the $a cmp $b default..Note the difference between $a <=> $b (numeric) and $a cmp $b (alphabetic) ). Perl "automagically" creates these $a and $b values for you. The job of the comparison subroutine is to figure out what to do with them. See the below code... #!/usr/bin/perl -w use strict; my @data = (<DATA>); print "Unsorted Data\n"; print @data; @data = sort { my ($tag_id_A,$tag_A) = $a =~ m/(\d+)/g; my ($tag_id_B,$tag_B) = $b =~ m/(\d+)/g; $tag_A <=> $tag_B or $tag_id_A <=> $tag_id_B }@data; print "Sorted Data\n"; print @data; #prints: #Unsorted Data #<tag id="12">125</tag> #<tag id="9">125</tag> #<tag id="17">15</tag> #<tag id="6">179</tag> #<tag id="7">2</tag> #Sorted Data #<tag id="7">2</tag> #<tag id="17">15</tag> #<tag id="9">125</tag> #<tag id="12">125</tag> #<tag id="6">179</tag> __DATA__ <tag id="12">125</tag> <tag id="9">125</tag> <tag id="17">15</tag> <tag id="6">179</tag> <tag id="7">2</tag> [download] What happens above is that Perl gives the sort comparison function pairs of lines which it calls $a and $b. I use a match global expression to get the 2 numbers on each of the $a and $b lines. Then comes a comparison section of those values which is just a big logic expression that is nicely formatted on several lines. It uses the 2nd number as the primary sort key, if they are equal (compare is numeric 0), then the second comparison will be executed. This means that "ties" are broken by the first number on the line. I added a case for that in your data. You might go, Hey! This is a subroutine, where is the "return()" statement? By default, Perl returns the value of the last statement in a sub. Normally you would have an explicit return, but this is an exception to the "rule". Here that would "look messy" and therefore from a style point of view, it is not done. The post by shmem uses a technique called a "Schwartzian transform". The idea is to pre-compute all of the stuff used to extract the numbers from the line in advance so that we don't have to do it every time that sort wants to compare a couple of lines. For lists of some size, maybe around a couple of dozen things, this speeds things up. However, the above type of code is functionally identical and I hope for you easier to understand. The performance difference typically won't matter (Perl is very good a regular expressions). Good luck and happy sorting. Update: To make it more clear that we are supplying a sub to sort, you can write it like below. This is the way to do it when you have to sort a bunch of different lists, but by the same criteria. The above code uses an anonymous subroutine (a sub which has no mame)..This uses a name for the comparison sub. `@data = sort by_tags @data; sub by_tags { my ($tag_id_A,$tag_A) = $a =~ m/(\d+)/g; my ($tag_id_B,$tag_B) = $b =~ m/(\d+)/g; $tag_A <=> $tag_B or $tag_id_A <=> $tag_id_B }` [download]	[reply] [d/l] [select]