in reply to Sorting a two-dimensional array by two columns

The Schwartzian transform is the classic way to do this. First transform your multi-dimensional array to a list of key/reference pairs where the key is some hash of the values you want to sort on (a unique hash is best :) and the reference is a reference to the row the key pertains to. Then sort the pairs by the key. Finally pull the rows out and you're done.

use strict; use warnings; my @chrnums = ( [qw(chr1 10)], [qw(chr3 20)], [qw(chr1 30)], [qw(chr3 5)], [qw(chr5 5)], ); @chrnums = map { #Extract sorted rows $_->[1] } sort {$a->[0] cmp $b->[0]} map { #Transform to keys and refs [(sprintf "%5d%5d", $_->[0] =~ /(\d+)/, $_->[1]), $_] } @chrnums; print "$_->[0] $_->[1]\n" for @chrnums;

Prints:

chr1 10 chr1 30 chr3 5 chr3 20 chr5 5

DWIM is Perl's answer to Gödel

Replies are listed 'Best First'.
Re^2: Sorting a two-dimensional array by two columns
by johngg (Canon) on Jul 05, 2006 at 23:14 UTC
    Maybe I'm wrong but I think you've done a Guttman-Rosler Transform here, not a Schwartzian Transform. You have combined the two values to be sorted into a single string which can be lexically sorted rather than keeping them separate and doing a sort {$a->[0] <=> $b->[0] || $a->[1] <=> $b->[1]}. From what I've read the GRT usually uses pack to combine the sort keys but sprintf looks just as effective here.

    Cheers,

    JohnGG

      Heh, looks like you are right. I didn't know about GRT, but it seemed like an obvious clean up of the code and possibly faster too.

      I was pleased with the reply because it's the first time I've "clean room" coded a ST (or possibly a GRT) - I didn't look at an example. I progress. :)


      DWIM is Perl's answer to Gödel
        It looks like you've re-invented the GRT without any reference to the original authors. Perhaps GRT could now be for GRandfaTher method :)
Re^2: Sorting a two-dimensional array by two columns
by Hue-Bond (Priest) on Jul 05, 2006 at 23:21 UTC
    The Schwartzian transform is the classic way to do this

    To build upon your good reply, and as a personal exercise, I wanted to implement the same using the Guttman-Rosler transform (*), which is a optimization of the Schwartzian Transform based on the fact that the internal sort used by sort is faster than the Perl code you can supply to it. This is my first GRT ever :^). I also replaced the sprintf and pattern matching with a substr:

    my @chrnums = ( [qw(chr1 10)], [qw(chr3 20)], [qw(chr1 30)], [qw(chr3 5)], [qw(chr5 5)], ); my @s = map { $chrnums[(substr $_,8)] ## 8 on architectures where a long is 4 byt +es. See Config } sort map { pack "LLA*", (substr $chrnums[$_][0], 3), $chrnums[$_][1], $_ } 0..$#chrnums;

    (*) Update: As pointed out, it seems your code is already GRT (or almost). I didn't notice since I didn't read it line by line.

    --
    David Serrano

      Thanks, guys!

      The reason why I built up @chrnums and @coords first is because there are also such things as chrX and chrY, so the next step would be to include a small "if" within the code that would push, say, 100 to @chrnums in place of "X" and 101 in place of "Y" for chrX and chrY to be put in the list last.

      So I just thought it'd be faster and more readable to do it in two steps.

      From what you are suggesting I understand that it is more efficient to build just one sorting criterion instead of two and then ask Perl to sort the thing as a string - am I right? But then, will Perl understand that, say, chr10, should go after chr5?

      Thanks again!