a11 has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I need to sort a two dimensional array (@twodarray) that looks something like:
[0] [1] [0] chr1 10 [1] chr3 20 [2] chr1 30 [3] chr3 5 . . . [n] chr5 5
first by chr number (all chr1 go first, then all chr3, then all chr5, etc.) and within that by column 1 in ascending order. that is, the resulting array should look like that:
[0] [1] [0] chr1 10 [1] chr1 30 [2] chr3 5 [3] chr3 20 . . . [n] chr5 5
the code I'm trying to use is:
my @sortedtda; my @chrnums; my @coords; $i=0; for (@twodarray) { push @chrnums, $twodarray[$i][0]=~ (/(\d+)/); push @coords, $twodarray[$i][1]=~ (/(\d+)/); ++$i; } @sortedtda = @twodarray[ sort { $coords[$a] <=> $coords[$b] || $chrnums[$a] <=> $chrnums[$b] } 0..$#twodarray ];
For a small dataset, it seems to work okay, but for real data the result is somewhat random - perhaps my test dataset is just biased. Is there some kind of mistake or the whole thing doesn't make any sense at all? I would really appreciate any help. Thanks in advance!

Replies are listed 'Best First'.
Re: Sorting a two-dimensional array by two columns
by derby (Abbot) on Jul 05, 2006 at 22:29 UTC

    You need the cmp for strings and the spaceship for numbers:

    #!/usr/bin/perl use strict; use warnings; my @twod = ( [ 'chr1', 10 ], [ 'chr3', 20 ], [ 'chr1', 30 ], [ 'chr3', 5] ); my @sorted = sort { $a->[0] cmp $b->[0] || $a->[1] <=> $b->[1] } @twod +; foreach my $row ( @sorted ) { print $row->[0], " ", $row->[1], "\n"; }

    -derby
Re: Sorting a two-dimensional array by two columns
by GrandFather (Saint) on Jul 05, 2006 at 22:47 UTC

    The Schwartzian transform is the classic way to do this. First transform your multi-dimensional array to a list of key/reference pairs where the key is some hash of the values you want to sort on (a unique hash is best :) and the reference is a reference to the row the key pertains to. Then sort the pairs by the key. Finally pull the rows out and you're done.

    use strict; use warnings; my @chrnums = ( [qw(chr1 10)], [qw(chr3 20)], [qw(chr1 30)], [qw(chr3 5)], [qw(chr5 5)], ); @chrnums = map { #Extract sorted rows $_->[1] } sort {$a->[0] cmp $b->[0]} map { #Transform to keys and refs [(sprintf "%5d%5d", $_->[0] =~ /(\d+)/, $_->[1]), $_] } @chrnums; print "$_->[0] $_->[1]\n" for @chrnums;

    Prints:

    chr1 10 chr1 30 chr3 5 chr3 20 chr5 5

    DWIM is Perl's answer to Gödel
      Maybe I'm wrong but I think you've done a Guttman-Rosler Transform here, not a Schwartzian Transform. You have combined the two values to be sorted into a single string which can be lexically sorted rather than keeping them separate and doing a sort {$a->[0] <=> $b->[0] || $a->[1] <=> $b->[1]}. From what I've read the GRT usually uses pack to combine the sort keys but sprintf looks just as effective here.

      Cheers,

      JohnGG

        Heh, looks like you are right. I didn't know about GRT, but it seemed like an obvious clean up of the code and possibly faster too.

        I was pleased with the reply because it's the first time I've "clean room" coded a ST (or possibly a GRT) - I didn't look at an example. I progress. :)


        DWIM is Perl's answer to Gödel
      The Schwartzian transform is the classic way to do this

      To build upon your good reply, and as a personal exercise, I wanted to implement the same using the Guttman-Rosler transform (*), which is a optimization of the Schwartzian Transform based on the fact that the internal sort used by sort is faster than the Perl code you can supply to it. This is my first GRT ever :^). I also replaced the sprintf and pattern matching with a substr:

      my @chrnums = ( [qw(chr1 10)], [qw(chr3 20)], [qw(chr1 30)], [qw(chr3 5)], [qw(chr5 5)], ); my @s = map { $chrnums[(substr $_,8)] ## 8 on architectures where a long is 4 byt +es. See Config } sort map { pack "LLA*", (substr $chrnums[$_][0], 3), $chrnums[$_][1], $_ } 0..$#chrnums;

      (*) Update: As pointed out, it seems your code is already GRT (or almost). I didn't notice since I didn't read it line by line.

      --
      David Serrano

        Thanks, guys!

        The reason why I built up @chrnums and @coords first is because there are also such things as chrX and chrY, so the next step would be to include a small "if" within the code that would push, say, 100 to @chrnums in place of "X" and 101 in place of "Y" for chrX and chrY to be put in the list last.

        So I just thought it'd be faster and more readable to do it in two steps.

        From what you are suggesting I understand that it is more efficient to build just one sorting criterion instead of two and then ask Perl to sort the thing as a string - am I right? But then, will Perl understand that, say, chr10, should go after chr5?

        Thanks again!

Re: Sorting a two-dimensional array by two columns
by duckyd (Hermit) on Jul 05, 2006 at 22:36 UTC
    I'm not sure why you're building up @chrnums and @coords and then taking a slice of @twodarray. You should be able to do everything in 1 step with something like:
    my @data = ( [ 'chr1', 10 ], [ 'chr1', 30 ], [ 'chr1', 20 ], [ 'chr3', 11 ], [ 'chr3', 40 ], [ 'chr2', 10 ], [ 'chr3', 10 ], [ 'chr2', 40 ], [ 'chr3', 20 ], [ 'chr2', 20 ], ); my @sorted = sort { $a->[0] cmp $b->[0] || $a->[1] <=> $b->[1] } @data ;