Sorting a two-dimensional array by two columns

a11 has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I need to sort a two dimensional array (@twodarray) that looks something like:

    [0]  [1]
[0] chr1 10 
[1] chr3 20 
[2] chr1 30 
[3] chr3 5     
.
.
.
[n] chr5 5
[download]

first by chr number (all chr1 go first, then all chr3, then all chr5, etc.) and within that by column 1 in ascending order. that is, the resulting array should look like that:

    [0]  [1]
[0] chr1 10 
[1] chr1 30 
[2] chr3 5 
[3] chr3 20     
.
.
.
[n] chr5 5
[download]

the code I'm trying to use is:

my @sortedtda;
my @chrnums;
my @coords;
$i=0;

for (@twodarray) {
    push @chrnums, $twodarray[$i][0]=~ (/(\d+)/);
    push @coords,  $twodarray[$i][1]=~ (/(\d+)/);
    ++$i;
    }

@sortedtda = @twodarray[ sort {
            $coords[$a] <=> $coords[$b]
                 ||
            $chrnums[$a] <=> $chrnums[$b]
               } 0..$#twodarray
           ];
[download]

For a small dataset, it seems to work okay, but for real data the result is somewhat random - perhaps my test dataset is just biased. Is there some kind of mistake or the whole thing doesn't make any sense at all? I would really appreciate any help. Thanks in advance!

Comment on Sorting a two-dimensional array by two columns Select or Download Code

Replies are listed 'Best First'.
Re: Sorting a two-dimensional array by two columns by derby (Abbot) on Jul 05, 2006 at 22:29 UTC
You need the cmp for strings and the spaceship for numbers: `#!/usr/bin/perl use strict; use warnings; my @twod = ( [ 'chr1', 10 ], [ 'chr3', 20 ], [ 'chr1', 30 ], [ 'chr3', 5] ); my @sorted = sort { $a->[0] cmp $b->[0] \|\| $a->[1] <=> $b->[1] } @twod +; foreach my $row ( @sorted ) { print $row->[0], " ", $row->[1], "\n"; }` [download] -derby	[reply] [d/l]
Re: Sorting a two-dimensional array by two columns by GrandFather (Saint) on Jul 05, 2006 at 22:47 UTC
The Schwartzian transform is the classic way to do this. First transform your multi-dimensional array to a list of key/reference pairs where the key is some hash of the values you want to sort on (a unique hash is best :) and the reference is a reference to the row the key pertains to. Then sort the pairs by the key. Finally pull the rows out and you're done. `use strict; use warnings; my @chrnums = ( [qw(chr1 10)], [qw(chr3 20)], [qw(chr1 30)], [qw(chr3 5)], [qw(chr5 5)], ); @chrnums = map { #Extract sorted rows $_->[1] } sort {$a->[0] cmp $b->[0]} map { #Transform to keys and refs [(sprintf "%5d%5d", $_->[0] =~ /(\d+)/, $_->[1]), $_] } @chrnums; print "$_->[0] $_->[1]\n" for @chrnums;` [download] Prints: `chr1 10 chr1 30 chr3 5 chr3 20 chr5 5` [download] DWIM is Perl's answer to Gödel	[reply] [d/l] [select]
Re^2: Sorting a two-dimensional array by two columns by johngg (Canon) on Jul 05, 2006 at 23:14 UTC
Maybe I'm wrong but I think you've done a Guttman-Rosler Transform here, not a Schwartzian Transform. You have combined the two values to be sorted into a single string which can be lexically sorted rather than keeping them separate and doing a `sort {$a->[0] <=> $b->[0] \|\| $a->[1] <=> $b->[1]}`. From what I've read the GRT usually uses `pack` to combine the sort keys but `sprintf` looks just as effective here. Cheers, JohnGG	[reply] [d/l] [select]
Re^3: Sorting a two-dimensional array by two columns by GrandFather (Saint) on Jul 05, 2006 at 23:33 UTC
Heh, looks like you are right. I didn't know about GRT, but it seemed like an obvious clean up of the code and possibly faster too. I was pleased with the reply because it's the first time I've "clean room" coded a ST (or possibly a GRT) - I didn't look at an example. I progress. :) DWIM is Perl's answer to Gödel	[reply]
Re^4: Sorting a two-dimensional array by two columns by johngg (Canon) on Jul 09, 2006 at 16:46 UTC
Re^2: Sorting a two-dimensional array by two columns by Hue-Bond (Priest) on Jul 05, 2006 at 23:21 UTC
The Schwartzian transform is the classic way to do this To build upon your good reply, and as a personal exercise, I wanted to implement the same using the Guttman-Rosler transform (), which is a optimization of the Schwartzian Transform based on the fact that the internal sort used by sort is faster than the Perl code you can supply to it. This is my first GRT ever :^). I also replaced the sprintf and pattern matching with a substr: `my @chrnums = ( [qw(chr1 10)], [qw(chr3 20)], [qw(chr1 30)], [qw(chr3 5)], [qw(chr5 5)], ); my @s = map { $chrnums[(substr $_,8)] ## 8 on architectures where a long is 4 byt +es. See Config } sort map { pack "LLA", (substr $chrnums[$_][0], 3), $chrnums[$_][1], $_ } 0..$#chrnums;` [download] () Update*: As pointed out, it seems your code is already GRT (or almost). I didn't notice since I didn't read it line by line. -- David Serrano	[reply] [d/l]
Re^3: Sorting a two-dimensional array by two columns by a11 (Novice) on Jul 06, 2006 at 08:33 UTC
Thanks, guys! The reason why I built up @chrnums and @coords first is because there are also such things as chrX and chrY, so the next step would be to include a small "if" within the code that would push, say, 100 to @chrnums in place of "X" and 101 in place of "Y" for chrX and chrY to be put in the list last. So I just thought it'd be faster and more readable to do it in two steps. From what you are suggesting I understand that it is more efficient to build just one sorting criterion instead of two and then ask Perl to sort the thing as a string - am I right? But then, will Perl understand that, say, chr10, should go after chr5? Thanks again!	[reply]
Re: Sorting a two-dimensional array by two columns by duckyd (Hermit) on Jul 05, 2006 at 22:36 UTC
I'm not sure why you're building up @chrnums and @coords and then taking a slice of @twodarray. You should be able to do everything in 1 step with something like: `my @data = ( [ 'chr1', 10 ], [ 'chr1', 30 ], [ 'chr1', 20 ], [ 'chr3', 11 ], [ 'chr3', 40 ], [ 'chr2', 10 ], [ 'chr3', 10 ], [ 'chr2', 40 ], [ 'chr3', 20 ], [ 'chr2', 20 ], ); my @sorted = sort { $a->[0] cmp $b->[0] \|\| $a->[1] <=> $b->[1] } @data ;` [download]	[reply] [d/l]