transforming a table

v15 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Everyone, I have a table in 2 column format. Like this

geneA    T1
geneA    T1
geneA    T2
geneB    T8
geneC    T10
geneC    T1
[download]

I want to transform it into a table like this

NAMES    T1    T2    T8    T10
geneA    +    +    -    -
geneB    -    -    +    -
geneC    +    -    -    +
[download]

So in this case T1 and T2 are present for gene A so we put a + sign but T8 and T10 are absent so we put a - sign. Similarly for others. How can I do this. I tried something like this BUT i am stuck what to do next

#!/usr/bin/perl-w
use strict;
use warnings;
use List::MoreUtils qw(uniq);

my %gene2TF2val = ();
my @TF = ();
while(<>){
    chomp;
    my @s = split /\s+/,$_;
    push @TF , $s[1]; # pushing every TF into array @TF but this is st
+ill not unique list of transcription factors.
    $gene2TF2val{$s[1]}->{$s[0]} = "-";    
}

@TF = uniq @TF;
[download]

Any help would be appreciated. Thanks

Comment on transforming a table Select or Download Code

Replies are listed 'Best First'.

Re: transforming a table
by Athanasius (Archbishop) on Apr 04, 2016 at 06:32 UTC

Hello v15, and welcome to the Monastery!

I would suggest that you structure the main hash so that each gene name is keyed to an anonymous array of TF values. Then you can use the any function from List::Util to determine whether a given TF corresponds to a given gene:

#! perl
use strict;
use warnings;
use List::Util qw( any );

my (%gene2TF2val, %TF);

while (<DATA>)
{
    my ($gene, $tf) = split;

    push @{ $gene2TF2val{ $gene } }, $tf;
    ++$TF{ $tf };
}

# Print table header

print "\t$_" for sort tf_sort keys %TF;
print "\n";

# Print table contents

for my $gene (sort keys %gene2TF2val)
{
    # Print one line

    print $gene;

    for my $tf (sort tf_sort keys %TF)
    {
        print "\t",
              (any { $_ eq $tf } @{ $gene2TF2val{$gene} }) ? '+' : '-'
+;
    }

    print "\n";
}

sub tf_sort
{
    my ($pre_a, $num_a) = $a =~ /^(\D+)(\d+)/;
    my ($pre_b, $num_b) = $b =~ /^(\D+)(\d+)/;

    return $pre_a cmp $pre_b ||
           $num_a <=> $num_b;
}

__DATA__
geneA    T1
geneA    T1
geneA    T2
geneB    T8
geneC    T10
geneC    T1
[download]

Output:

16:28 >perl 1585_SoPW.pl
        T1      T2      T8      T10
geneA   +       +       -       -
geneB   -       -       +       -
geneC   +       -       -       +

16:30 >
[download]

(The trickiest part is writing the custom sort routine tf_sort to ensure that “T10” comes after “T8” — see sort.)

Hope that helps,

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

[reply]
[d/l]
[select]

Re: transforming a table
by kennethk (Abbot) on Apr 04, 2016 at 13:42 UTC

Athanasius

#!/usr/bin/perl
use strict;
use warnings;
use List::MoreUtils qw(uniq);
use 5.10.0;

my %gene2TF2val;
while (<DATA>) {
    my ($gene, $tf) = split;
    $gene2TF2val{$gene}{$tf}++;
}

# ID distinct transcription factors, sorted by value
my @tf = sort { ($a =~ /(\d+)/)[0] <=>  ($b =~ /(\d+)/)[0]} 
         uniq map keys %$_, values %gene2TF2val;

# Print table header
say join "\t", "NAMES", @tf;


# Print table contents
for my $gene (sort keys %gene2TF2val) {
    say join "\t", 
        $gene, 
        map $_ ? '+' : '-', 
            @{$gene2TF2val{$gene}}{@tf};
}


__DATA__
geneA    T1
geneA    T1
geneA    T2
geneB    T8
geneC    T10
geneC    T1
[download]

#!/usr/bin/perl
use strict;
use warnings;
use List::MoreUtils qw(uniq);
use 5.10.0;

my %gene2TF2val;
while (<DATA>) {
    my ($gene, $tf) = split;
    $gene2TF2val{$gene}{$tf}++;
}

# ID distinct transcription factors, sorted by value
my @tf;
for my $tf (values %gene2TF2val) {
    push @tf, keys %$tf;
}
@tf = sort { ($a =~ /(\d+)/)[0] <=>  ($b =~ /(\d+)/)[0]} 
      uniq @tf;

# Print table header
say join "\t", "NAMES", @tf;

# Print table contents
for my $gene (sort keys %gene2TF2val) {
    print $gene;
    for my $tf (@tf) {
        my $has = $gene2TF2val{$gene}{$tf} ? '+' : '-';
        print "\t$has";
    }
    print "\n";
}


__DATA__
geneA    T1
geneA    T1
geneA    T2
geneB    T8
geneC    T10
geneC    T1
[download]

#11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

[reply]
[d/l]
[select]

Re: transforming a table
by woland99 (Beadle) on Apr 04, 2016 at 17:42 UTC

[reply]