Re: Sort a 2D array based on 2 columns
by shmem (Chancellor) on Apr 26, 2009 at 07:25 UTC
|
This is a FAQ. See perlfaq4 or perldoc -q sort:
Found in /usr/local/lib/perl5/5.10.0/pod/perlfaq4.pod
How do I sort an array by (anything)?
If you need to sort on several fields, the following paradigm is
useful.
@sorted = sort {
field1($a) <=> field1($b) ||
field2($a) cmp field2($b) ||
field3($a) cmp field3($b)
} @data;
Change that to use references and you're done.
| [reply] [d/l] [select] |
|
|
While this complicated sort function does work, if there is a lot of data, it is faster to precompute a sort key for each row, and sort using that. The Schwartzian Transform is the standard name for this, and you should read up on it. It is also in perlfaq4.
The general form of the ST is
@sorted = map { $_->[0] }
sort { $a->[1] cmp $b->[1] }
map { [$_, foo($_)] }
@unsorted;
where foo() does whatever is necessary to make a sortable key. Obviously it is very specific to the data that you are working with. In your case the second column is an integer, and the fourth is either a 0 or a 1, so I'd go with something like
sub foo($) { $_[0]->[1] . "." . $_[0]->[3]; }
which will produce a single real number that you can compare using the <=> operator. | [reply] [d/l] [select] |
Re: Sort a 2D array based on 2 columns
by CountZero (Bishop) on Apr 26, 2009 at 07:28 UTC
|
This will do the trick in Perl: use strict;
use warnings;
my @megaMotif = (
[ "AGCT", "0", "370", "1" ],
[ "AGGT", "3", "52", "1" ],
[ "TGAA", "2", "233", "0" ],
[ "AGAG", "0", "32", "0" ]
);
my @megaMotifSorted =
sort { $a->[1] <=> $b->[1] || $a->[3] <=> $b->[3] }
@megaMotif;
Please note the following two things:- The name of the language is "Perl", not "PERL".
- When making your array, the outhernmost brackets should be parentheses "( ... )" and not square brackets "[ ... ]". Square brackets will make your list into an anonymous array rather than a normal array. In your version @megaMotif only contains one element, such as "ARRAY(some_hex_value)". Doing a sort on an array with only one element will not bring you very far.
Did you just change the brackets in your posting or am I going mad? I could have sworn I just copied-and-pasted your code in my editor and the outernmost brackets were square and not round. If you did change them afterwards, make a little note of it otherwise it gets difficult to follow the discussion.
CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James
| [reply] [d/l] [select] |
|
|
Thank You.
Sometimes I get stuck badly and dont see the easiest thing by which I could move ahead like this. A little push in the right direction is invaluable.
Happens with anyone else?
Thank You again!
| [reply] |
|
|
Sure does! Happens a lot to me. Here are some links to posts to remind you that you are in good company - the best really:
Posting questions, even ones that turn out to be silly mistakes can be very, very helpful to the next person down the road who makes the same silly mistake. Perl Monks is the great place it is because people have had the courage to make mistakes and let others see them.
Best, beth
| [reply] |
Re: Sort a 2D array based on 2 columns
by perliff (Monk) on Apr 26, 2009 at 08:14 UTC
|
use the CPAN module Sort::Fields. However, instead of having your data in a 2d array, sort:fields will sort if you have data that is tab delimited, each row in a separate array element.
so if you have a tab-delimited file say data.txt with your data like this...
AGCT 0 370 1
AGAG 0 32 0
TGAA 2 233 0
AGGT 3 52 1
you can read in your data directly and sort. no need to make a 2d array...
use strict; # before anything else
use Sort::Fields;
use Data::Dumper;
open(INP,"data.txt") || die "where's the file eh?";
my @data=<INP>;
chomp (@data); # remove new line characters
print "before sorting...\n";
print Dumper @data;
#sort the data on the 2nd and the 4th field
#this is numeric sort, as indicated by the "n"
my @sorted = fieldsort '\t', [ '2n', '4n' ], @data;
print "after sorting...\n";
print Dumper @sorted;
This produces the following output...
before sorting...
$VAR1 = 'AGCT 0 370 1';
$VAR2 = 'AGAG 0 32 0';
$VAR3 = 'TGAA 2 233 0';
$VAR4 = 'AGGT 3 52 1';
after sorting...
$VAR1 = 'AGAG 0 32 0';
$VAR2 = 'AGCT 0 370 1';
$VAR3 = 'TGAA 2 233 0';
$VAR4 = 'AGGT 3 52 1';
If you want to reverse sort, numerically, use a "-2n", for
reverse numeric sorting by the second column. leaving out the
"n" makes a alphanumeric sort. There are a number of examples in the
module doc. I particularly like this as it is
very flexible and provides the same kind of sorting power that you
get using the unix sort, that has also been suggested in this thread.
perliff
----------------------
"with perl on my side"
| [reply] [d/l] [select] |
|
|
Note that the loop
foreach my $ele (@data) {chomp $ele}
is the same as the statement
chomp(@data);
| [reply] [d/l] [select] |
|
|
thanks for the tip! saves me a lot of typing. I changed the code.
perliff
----------------------
"with perl on my side"
| [reply] |
Re: Sort a 2D array based on 2 columns
by Anonymous Monk on Apr 26, 2009 at 07:06 UTC
|
Don't re-invent the wheel. Use the "sort" command like this:
$ cat junk.txt
AGCT:0:370:1
AGGT:3:52:1
TGAA:2:233:0
AGAG:0:32:0
$ sort -t: -k2,3 junk.txt
AGAG:0:32:0
AGCT:0:370:1
TGAA:2:233:0
AGGT:3:52:1
| [reply] [d/l] |
|
|
Hello,
I forgot to mention I am coding this thing in PERL. I wish to have a 2D array of all those values so that I may process the array further.
I believe to do it in the elegant way you suggested, I would have to output that array in a file first- say junk.txt
"System" command would help me with the "sort" command.
I will again have to retrieve the sorted file and slurp it in an array once more.
I really do not wish to have that overhead but if most would agree that this is the easiest way to go about it I would go ahead and use it.
| [reply] |
|
|
#!/usr/bin/perl -w
use strict;
my @mega = (
["AGCT", "0", "370", "1"],
["AGGT", "3", "52", "1"],
["TGAA", "2", "233", "0"],
["AGAG", "0", "32", "0"]
);
open ( fh, ">junk.txt" );
for my $row (@mega) {
print fh (join ":", @{$row}) . "\n";
}
close fh;
my $shellout = <<`SHELL`;
sort -n -t: -k2,4 junk.txt
SHELL
print "$shellout\n";
... if you expend some effort you can find away of avoiding writing the array to a file as sort does accept input form STDIN also, read the man page. | [reply] [d/l] |
|
|