in reply to Re^5: Position Weight Matrix of Set of Strings
in thread Position Weight Matrix of Set of Strings

BrowserUk,
Given this set of sequences where base A doesn't exists (i.e, only CGT).
my @inst = ( 'GGTGTGCC', 'GCTGTGGG', 'GGTGGGCC' );
The code gives this matrix only:
T => 0 0 1 0 0.666666666666667 0 0 0 C => 0 0.333333333333333 0 0 0 0 0.666666666666667 0.666666666666667 G => 1 0.666666666666667 0 1 0.333333333333333 1 0.333333333333333 0.3 +33333333333333
Note that it doesn't show matrix element of A:
A => 0 0 0 0 0 0 0 0
My question is how can I initialize your code above, so it can also includes matrix where the bases doesn't exist?
Given that there are only 4 alphabets [ATCG] required for the construction of matrix.

Regards,
Edward

Replies are listed 'Best First'.
Re^7: Position Weight Matrix of Set of Strings
by BrowserUk (Patriarch) on Sep 19, 2006 at 01:23 UTC

    Better?

    #! perl -slw use strict; use Data::Dumper; sub get_pwm { my @data = @_; my $l = length( $data[0] ); my %pwm = map{ $_ => [ (0) x length( $_[ 0 ] ) ] } qw[ A C G T ]; foreach my $line (@data) { ++$pwm{ substr $line, $_, 1 }[$_] for 0 .. $l - 1; } my $n = @data; @$_ = map { $_ ? $_ / $n : 0 } @{ $_ }[ 0 .. $l - 1 ] for values +%pwm; return \%pwm; } my @inst = ( 'GGTGTGCC', 'GCTGTGGG', 'GGTGGGCC' ); my $res = get_pwm(@inst); printf "%1s => [ %s ]\n", $_ => join ' ', map{ sprintf '%5.2f', $_ } @{ $res->{ $_ } } for keys %{ $res }; __END__ c:\test>530623.pl A => [ 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 ] T => [ 0.00 0.00 1.00 0.00 0.67 0.00 0.00 0.00 ] C => [ 0.00 0.33 0.00 0.00 0.00 0.00 0.67 0.67 ] G => [ 1.00 0.67 0.00 1.00 0.33 1.00 0.33 0.33 ]

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.