soblanc has asked for the wisdom of the Perl Monks concerning the following question:

Hello everyone, I have created a matrix as follows : my @matrix = ( [@transcripts], [@alleles], [@effects] ); Let's imagine a case where we have 2 transcripts and 3 alleles for each transcript. So my matrix looks likes this:

t1 t1 t2 t2 t1 t2 a1 a2 a1 a2 a3 a3 mis mis mis mis del del

I would like to :

The goal is in the end to retrieve a vector of effects which will be in the correct order (in my exemple : mis mis del mis mis del). How shoud I proceed? Thank you very much!

Replies are listed 'Best First'.
Re: Sort a matrix by row
by haukex (Archbishop) on Aug 30, 2022 at 10:21 UTC

    Please see How do I change/delete my post? and use <code> tags to format code, sample input, and expected output. Fixed by GrandFather.

    Anyway, here's a pure-Perl solution. It doesn't actually sort the matrix, it just gives you the sorted indicies of the rows, but that's enough to get your expected output, and hopefully you can see how to use the output to reorder the matrix if you like*.

    #!/usr/bin/env perl use warnings; use strict; use Data::Dump; my @matrix = ( [qw/ t1 t1 t2 t2 t1 t2 /], [qw/ a1 a2 a1 a2 a3 a3 /], [qw/ mis mis mis mis del del /], ); my @idx = sort { $matrix[0][$a] cmp $matrix[0][$b] or $matrix[1][$a] cmp $matrix[1][$b] } 0..$#{$matrix[0]}; dd @idx; # (0, 1, 4, 2, 3, 5) my @out = map { $matrix[2][$_] } @idx; dd @out; # ("mis", "mis", "del", "mis", "mis", "del")

    * Update:

      Thanks a lot! it works fine to sort according to transcripts (t1 and then t2). And in my example, it reorders "de facto" alleles. But it won't always be the case in my file...

      Sorting according to transcripts is the first step I want to achieve. But then I want to sort alleles according to another array, let's say :

      my @alleles_origin = (a3,a2,a1);

      The number of members in @alleles_origin is the same for each transcript (t1 and t2). So I could generate an array of the same length as the others to integrate it to the matrix, so that my new matrix would be :

      my @new_matrix = ( [t1 t1 t1 t2 t2 t2], [a1 a2 a3 a1 a2 a3], <- what we have [a3 a2 a1 a3 a2 a1], <- what we want for each transcript [mis mis del mis mis syn], );

      But then, how coud I reorder my effects comparing the two lines of alleles, for each transcript (because effects can be different according to t1 or t2..)??

      Thank you so much in advance

        But then I want to sort alleles according to another array, let's say : my @alleles_origin = (a3,a2,a1);

        See for example the replies to the thread How to Order an Array's Elements to Match Another Array's Element Order.

        use warnings; use strict; use Data::Dump; my @allel_order = qw/ a3 a2 a1 /; my @matrix = ( [qw/ t1 t1 t2 t2 t1 t2 /], [qw/ a1 a2 a1 a2 a3 a3 /], [qw/ mis mis mis mis del del /], ); my %allel_order = map { $allel_order[$_] => $_ } 0..$#allel_order; my @idx = sort { $matrix[0][$a] cmp $matrix[0][$b] or $allel_order{$matrix[1][$a]} <=> $allel_order{$matrix[1][$b]} } 0..$#{$matrix[0]}; dd @idx; # (4, 1, 0, 5, 3, 2) my @out = map { $matrix[2][$_] } @idx; dd @out; # ("del", "mis", "mis", "del", "mis", "mis")
Re: Sort a matrix by row
by kcott (Archbishop) on Aug 30, 2022 at 11:09 UTC

    G'day soblanc,

    The general solution for this is to do a primary sort and then, when primary elements are the same, do a secondary sort. The primary and secondary sorts are separated by a || operator. A typical application would be to sort on "lastname" then, for those with the same "lastname", sort on "firstname".

    With the data you've presented, the aN values are already sorted within tN values; there's no way to use that to show how this works.

    I ran three tests:

    • using your OP data
    • swapping t2-a1 with t2-a3
    • further swapping t1-a1 with t1-a2

    I manually changed @matrix for each run. Here's the final code.

    #!/usr/bin/env perl use strict; use warnings; my @matrix = ( [qw{t1 t1 t2 t2 t1 t2}], [qw{a2 a1 a3 a2 a3 a1}], [qw{mis mis mis mis del del}], ); print "Original\n"; print_matrix(\@matrix); my @sorted_indices = sort { $matrix[0][$a] cmp $matrix[0][$b] || $matrix[1][$a] cmp $matrix[1][$b] } 0 .. $#{$matrix[0]}; print "Sorted indices\n"; print "@sorted_indices\n"; sub print_matrix { my ($matrix) = @_; for my $row (0 .. $#$matrix) { print join(' ', @{$matrix[$row]}), "\n"; } }

    Here's the output for the three runs:

    ken@titan ~/tmp $ ./pm_11146491_sort_matrix.pl Original t1 t1 t2 t2 t1 t2 a1 a2 a1 a2 a3 a3 mis mis mis mis del del Sorted indices 0 1 4 2 3 5 ken@titan ~/tmp $ ./pm_11146491_sort_matrix.pl Original t1 t1 t2 t2 t1 t2 a1 a2 a3 a2 a3 a1 mis mis mis mis del del Sorted indices 0 1 4 5 3 2 ken@titan ~/tmp $ ./pm_11146491_sort_matrix.pl Original t1 t1 t2 t2 t1 t2 a2 a1 a3 a2 a3 a1 mis mis mis mis del del Sorted indices 1 0 4 5 3 2

    Armed with the sorted indices, I'll assume you can create sorted matrices. If you encounter problems with this, show us what you tried and where you encountered difficulties — we can provide further help when we know what problem you're having.

    — Ken

      Thank you for your answers!

      Indeed, working with indexes do the job, because in the end I just want to retrieve the array "effects" in the right order, and not necessarily reconstruct the matrix.

      So with my example, this code works fine :

      my @idx = sort { $matrix[0][$a] cmp $matrix[0][$b] } 0..$#{$matrix[0]} +; my @effets_ord = map { $matrix[2][$_] } @idx;

      If I understood correctly, this sorts the transcript's row (attributes indexes actually) and finally I get the effects in the right order.

      This is the first step of what I aim to do.

      BUT for the second step (sort alleles) the subtlety I omit to say is that I want to order alleles according to another array, let's say for example:

      my @alleles_origin = (a2,a1,a3);

      Knowing the number of members in @alleles is the same for each transcript (t1 and t2).

      So finally, the effects in my array @effects_sorted would be in the order of : t1 then t2 (this step is ok now thanks to you guys), and also alleles in the order of @alleles_origin for t1 and t2.

Re: Sort a matrix by row
by BillKSmith (Monsignor) on Aug 30, 2022 at 17:28 UTC
    The following is a variation on the 'Schwartzian' transform (Ref: FAQ How do I sort an array by (anything)?.) Map your array into one that can be sorted easily, sort it, and extract the results.
    use strict; use warnings; use Test::More tests=>1; my @transcripts = qw( t1 t1 t2 t2 t1 t2 ); my @alles = qw( a1 a2 a1 a2 a3 a3 ); my @effects = qw( mis mis mis mis del del ); my @transpose_matrix = map { [shift(@transcripts), shift(@alles), shift(@effects)] } 0. +.5; my @sorted_transpose = sort{ $a->[0] cmp $b->[0] or $a->[1] cmp $b->[1] } @transpose_ma +trix; my @sorted_effects = map {$_->[2]} @sorted_transpose; is_deeply( \@sorted_effects, [qw(mis mis del mis mis del)], 'sort by r +ow');

    UPDATE: Use of zip_by of List::UtilsBy allows the Schwartzian Transform to be coded in the usual way without any explicit loop.

    use strict; use warnings; use Test::More tests=>1; use List::UtilsBy qw(zip_by); my @transcripts = qw( t1 t1 t2 t2 t1 t2 ); my @alles = qw( a1 a2 a1 a2 a3 a3 ); my @effects = qw( mis mis mis mis del del ); my @matrix = (\@transcripts, \@alles, \@effects); my @sorted_effects = map {$_->[2]} sort{ $a->[0] cmp $b->[0] or $a->[1] cmp $b->[1] } zip_by {[@_]} \@transcripts, \@alles, \@effects; is_deeply( \@sorted_effects, [qw(mis mis del mis mis del)], 'sort by r +ow');

    Result:

    1..1 ok 1 - sort by row
    Bill
Re: Sort a matrix by row
by tybalt89 (Monsignor) on Aug 30, 2022 at 17:28 UTC
    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11146491;showspoiler=1114 +6492-1 use warnings; use List::AllUtils qw( zip_by sort_by nsort_by ); my @matrix = ( [qw/ t1 t1 t2 t2 t1 t2 /], [qw/ a1 a2 a1 a2 a3 a3 /], [qw/ mis mis mis mis del del /], ); my @alleles_origin = qw( a2 a1 a3 ); my %alleles_sort_order = zip_by { @_ } \@alleles_origin, [ 1 .. @allel +es_origin ]; my @effects = map $_->[2], my @sorted_transpose = sort_by { $_->[0] } # the tN nsort_by { $alleles_sort_order{ $_->[1] } } # the aN zip_by { [ @_ ] } @matrix; use Data::Dump 'dd'; dd { 'sorted_transpose' => \@sorted_transpose, 'wanted effects' => \@effects, 'matrix' => \@matrix };

    Outputs:

    { "matrix" => [ ["t1", "t1", "t2", "t2", "t1", "t2"], ["a1", "a2", "a1", "a2", "a3", "a3"], ["mis", "mis", "mis", "mis", "del", "del"], ], "sorted_transpose" => [ ["t1", "a2", "mis"], ["t1", "a1", "mis"], ["t1", "a3", "del"], ["t2", "a2", "mis"], ["t2", "a1", "mis"], ["t2", "a3", "del"], ], "wanted effects" => ["mis", "mis", "del", "mis", "mis", "del"], }