in reply to Re^3: Position Weight Matrix of Set of Strings
in thread Position Weight Matrix of Set of Strings

Dear BrowserUK,
I was trying your code above with this set of strings:
my @inst = ( 'CAGGTG', 'CAGGTG' ); my $res1 = get_pwm(@inst);
But why it returns this answer:
$VAR1 = { 'A' => [ 0, '1' ], 'T' => [ 0, 0, 0, 0, '1' ], 'C' => [ '1' ], 'G' => [ 0, 0, '1', '1', 0, '1' ] };
Instead of the correct
$VAR1 = { 'A' => [ '0', '1', '0', '0', '0', '0' ], 'T' => [ '0', '0', '0', '0', '1', '0' ], 'C' => [ '1', '0', '0', '0', '0', '0' ], 'G' => [ '0', '0', '1', '1', '0', '1' ] };
It seems that the code didn't supply 0 when the bases is not present in a particular column as it should after it found 1.
I thought from this line of the code should do that job, but seems not.
@$_ = map { $_ ? $_ / $n : 0 } @$_ for values %pwm;
Please advice.

---
neversaint and everlastingly indebted.......

Replies are listed 'Best First'.
Re^5: Position Weight Matrix of Set of Strings
by BrowserUk (Patriarch) on Sep 07, 2006 at 03:55 UTC

    Try this

    #! perl -slw use strict; use Data::Dumper; sub get_pwm { my @data = @_; my $l = length( $data[0] ); my %pwm; foreach my $line (@data) { ++$pwm{ substr $line, $_, 1 }[$_] for 0 .. $l - 1; } my $n = @data; @$_ = map { $_ ? $_ / $n : 0 } @{ $_ }[ 0 .. $l - 1 ] for values +%pwm; return \%pwm; } my @inst = ( 'CAGGTG', 'CAGGTG' ); my $res = get_pwm(@inst); print "$_ => @{ $res->{ $_ } }" for keys %{ $res }; __END__ c:\test>530623 A => 0 1 0 0 0 0 T => 0 0 0 0 1 0 C => 1 0 0 0 0 0 G => 0 0 1 1 0 1

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      BrowserUk,
      Given this set of sequences where base A doesn't exists (i.e, only CGT).
      my @inst = ( 'GGTGTGCC', 'GCTGTGGG', 'GGTGGGCC' );
      The code gives this matrix only:
      T => 0 0 1 0 0.666666666666667 0 0 0 C => 0 0.333333333333333 0 0 0 0 0.666666666666667 0.666666666666667 G => 1 0.666666666666667 0 1 0.333333333333333 1 0.333333333333333 0.3 +33333333333333
      Note that it doesn't show matrix element of A:
      A => 0 0 0 0 0 0 0 0
      My question is how can I initialize your code above, so it can also includes matrix where the bases doesn't exist?
      Given that there are only 4 alphabets [ATCG] required for the construction of matrix.

      Regards,
      Edward

        Better?

        #! perl -slw use strict; use Data::Dumper; sub get_pwm { my @data = @_; my $l = length( $data[0] ); my %pwm = map{ $_ => [ (0) x length( $_[ 0 ] ) ] } qw[ A C G T ]; foreach my $line (@data) { ++$pwm{ substr $line, $_, 1 }[$_] for 0 .. $l - 1; } my $n = @data; @$_ = map { $_ ? $_ / $n : 0 } @{ $_ }[ 0 .. $l - 1 ] for values +%pwm; return \%pwm; } my @inst = ( 'GGTGTGCC', 'GCTGTGGG', 'GGTGGGCC' ); my $res = get_pwm(@inst); printf "%1s => [ %s ]\n", $_ => join ' ', map{ sprintf '%5.2f', $_ } @{ $res->{ $_ } } for keys %{ $res }; __END__ c:\test>530623.pl A => [ 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 ] T => [ 0.00 0.00 1.00 0.00 0.67 0.00 0.00 0.00 ] C => [ 0.00 0.33 0.00 0.00 0.00 0.00 0.67 0.67 ] G => [ 1.00 0.67 0.00 1.00 0.33 1.00 0.33 0.33 ]

        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.