perl_user123 has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I have a file in the following format:

ENSG00000088992 TESC 1105894 Prot_Ente 0.31 0.038 ENSG00000105374 NKG7 1105894 Prot_Ente 0.37 0.01 ENSG00000005810 MYCBP2 4322986 Bact_Bact 0.29 0.044 ENSG00000088992 TESC 4322986 Bact_Bact 0.27 0.044 ENSG00000109016 DHRS7B 4322986 Bact_Bact -0.37 0.008 ENSG00000069248 NUP133 364926 Bact_Bact 0.32 0.024 ENSG00000005810 MYCBP2 363400 Firm_Lach -0.29 0.036 ENSG00000105374 NKG7 363400 Firm_Lach -0.27 0.047 ENSG00000105374 NKG7 364736 Firm_Lach -0.27 0.039 ENSG00000105374 NKG7 186735 Firm_Lach -0.30 0.037 ENSG00000133107 TRPC4 4322986 Bact_Bact 0.35 0.01

From this table I want to create a matrix where 1st column becomes the row names and 4th column becomes the column names. The values in the 5th column fill in the matrix. Something like this:

Gene Prot_Ente Bact_Bact Firm_Lach ENSG00000088992 0.31 ENSG00000105374 0.37 ENSG00000005810 0.29 ENSG00000088992 0.27 ENSG00000109016 -0.37 ENSG00000069248 0.32 ENSG00000005810 -0.29 ENSG00000105374 -0.27 ENSG00000105374 -0.27 ENSG00000105374 -0.30 ENSG00000133107 0.35

Following is the code that I am using and it is not working correctly. Not all the values of the 5th column are printed because of the redundant keys in the nested hashes that I am using.

$file=$ARGV[0]; open(FH,$file); open OUT1,">./$file\_rho_temp"; while(<FH>){ chomp; next if($_=~/^Gene/); @arr=split(/\s+/,$_); $rhash{$arr[0]}{$arr[3]}=$arr[4]; } @keys = keys %{$rhash{(keys %rhash)[0]}}; $format = "%1s " . ("%2s " x @keys) . "\n"; printf OUT1 $format, "Genes", @keys; foreach $key (keys %rhash) { printf OUT1 $format, $key, @{$rhash{$key}}{@keys}; }

Can someone suggest some modification in this code or another method to make this work? Thanks a lot in advance.

Replies are listed 'Best First'.
Re: creating a matrix like format
by choroba (Cardinal) on May 19, 2016 at 15:19 UTC
    It seems you want to keep the original ordering of the genes, so storing the data in a hash won't help much.

    I stored the data in an array instead:

    #!/usr/bin/perl use warnings; use strict; use feature qw{ say }; use List::Util qw{ uniq }; my @genes; while (<>) { my ($id, $type, $value) = (split)[ 0, 3, 4 ]; push @genes, [ $id, $type, $value ]; } my @types = sort +uniq(map $_->[1], @genes); say join "\t", 'Gene', @types; for my $gene (@genes) { say join "\t", $gene->[0], map $_ eq $gene->[1] ? $gene->[2] : q() +, @types; }

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
Re: creating a matrix like format
by NetWallah (Canon) on May 19, 2016 at 17:47 UTC
    Here is a one-liner option:
    >perl -anE "$h{$F[3]}||=$col++; push @x,[$F[0],$h{$F[3]},$F[4]]}{say q +q|Gene\t\t|,join(qq|\t|,keys %h);for my $r(@x){say qq|$r->[0]\t|,map{ +$h{$_}==$r->[1]? $r->[2]:qq|\t|} keys %h}" data.2
    Yes - it is a little ugly, but it works.

            This is not an optical illusion, it just looks like one.

Re: creating a matrix like format -- oneliner
by Discipulus (Canon) on May 19, 2016 at 20:17 UTC
    A oneliner sounds like an invitation..
    #warning windows double quotes perl -lanE "print $F[0],qq(\t)x($F[3]=~/Pro/?1:$F[3]=~/Bact/?2:3),$F[4 +]" genetable.txt ENSG00000088992 0.31 ENSG00000105374 0.37 ENSG00000005810 0.29 ENSG00000088992 0.27 ENSG00000109016 -0.37 ENSG00000069248 0.32 ENSG00000005810 -0.29 ENSG00000105374 -0.27 ENSG00000105374 -0.27 ENSG00000105374 -0.30 ENSG00000133107 0.35

    L*

    update if also want the headers printed you can use them to match:

    perl -lanE "BEGIN{@h=qw(Gene Prot_Ente Bact_Bact Firm_Lach);print join + qq(\t),@h} print $F[0],qq(\t)x($F[3]=~/$h[1]/?1:$F[3]=~/$h[2]/?2:3),$ +F[4]" genetable.txt Gene Prot_Ente Bact_Bact Firm_Lach ENSG00000088992 0.31 ENSG00000105374 0.37 ENSG00000005810 0.29 ENSG00000088992 0.27 ENSG00000109016 -0.37 ENSG00000069248 0.32 ENSG00000005810 -0.29 ENSG00000105374 -0.27 ENSG00000105374 -0.27 ENSG00000105374 -0.30 ENSG00000133107 0.35
    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.