soblanc has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

I can't manage to remove particular columns of my multidimentional dataframe (matrix). It's not strictly duplicates.

Let's consider the matrix below :

my @matrix = ( [t1 t1 t2 t2 t2], <- transcripts [a1 a2 a1 a1 a2], <- alleles [intron intron UTR_CG UTR UTR], <- locations );

In fact, I would like to delete colums where I have "UTR" alone for a given transcript and allele, but ONLY when I already have a column with "UTR_CG" for the same given transcript and allele. For instance, the resulting table would be :

my @matrix = ( [t1 t1 t2 t2], [a1 a2 a1 a2], [intron intron UTR_CG UTR], );

Thank you so much in advance!

Replies are listed 'Best First'.
Re: Remove particular columns of a matrix
by tybalt89 (Monsignor) on Sep 08, 2022 at 20:34 UTC

    Transpose is your friend...

    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11146766 use warnings; use List::AllUtils qw( zip_by ); my @matrix = ( [qw(t1 t1 t2 t2 t2)], # <- transcripts [qw(a1 a2 a1 a1 a2)], # <- alleles [qw(intron intron UTR_CG UTR UTR)], # <- locations ); use Data::Dump 'dd'; dd 'original matrix', \@matrix; my @t = zip_by { [ @_ ] } @matrix; use Data::Dump 'dd'; dd '@t', \@t; my %cg = map { $_->[0] . "\n" . $_->[1] => 1 } grep { $_->[2] eq 'UTR_ +CG' } @t; use Data::Dump 'dd'; dd '%cg', \%cg; @t = grep { $_->[2] ne 'UTR' or not $cg{ $_->[0] . "\n" . $_->[1] } } +@t; use Data::Dump 'dd'; dd 'modified @t', \@t; my @finalmatrix = zip_by { [ @_ ] } @t; use Data::Dump 'dd'; dd 'final matrix', \@finalmatrix;

    Outputs:

    ( "original matrix", [ ["t1", "t1", "t2", "t2", "t2"], ["a1", "a2", "a1", "a1", "a2"], ["intron", "intron", "UTR_CG", "UTR", "UTR"], ], ) ( "\@t", [ ["t1", "a1", "intron"], ["t1", "a2", "intron"], ["t2", "a1", "UTR_CG"], ["t2", "a1", "UTR"], ["t2", "a2", "UTR"], ], ) ("%cg", { "t2\na1" => 1 }) ( "modified \@t", [ ["t1", "a1", "intron"], ["t1", "a2", "intron"], ["t2", "a1", "UTR_CG"], ["t2", "a2", "UTR"], ], ) ( "final matrix", [ ["t1", "t1", "t2", "t2"], ["a1", "a2", "a1", "a2"], ["intron", "intron", "UTR_CG", "UTR"], ], )
Re: Remove particular columns of a matrix
by kcott (Archbishop) on Sep 08, 2022 at 17:29 UTC
Re: Remove particular columns of a matrix
by Marshall (Canon) on Sep 09, 2022 at 03:07 UTC
    Some index utils in List:MoreUtils are useful;
    use strict; use warnings; use List::MoreUtils qw(firstidx indexes); use Data::Dump qw(pp); my @matrix = ( [qw(t1 t1 t2 t2 t2)], [qw(a1 a2 a1 a1 a2)], [qw(intron intron UTR_CG UTR UTR)], ); die "No UTR_CG column!" unless ( (firstidx {$_ eq 'UTR_CG'}@{$matrix[- +1]}) > -1); my @utr_cols = indexes { $_ eq 'UTR' } @{$matrix[-1]}; die "Must have 2 or more UTR columns!" unless (@utr_cols >= 2); shift @utr_cols; #save first seen UTR column delete @{$matrix[$_]}[@utr_cols] for (0..@matrix-1); pp \@matrix; __END__ [ ["t1", "t1", "t2", "t2"], ["a1", "a2", "a1", "a1"], ["intron", "intron", "UTR_CG", "UTR"], ]
Re: Remove particular columns of a matrix
by LanX (Saint) on Sep 08, 2022 at 18:41 UTC
    my advice
    • use a grep to find all indices matching your fuzzy UTR criteria (?) and assign it to an array @indices
    • apply a delete on a array-slice, something like delete @{$matrix[$_]}[@indices] for 0..2
    Like kcott I'd rather prefer you to show some more effort.

    The pseudo code you demonstrated doesn't even compile.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery