in reply to Re^2: Position Weight Matrix of Set of Strings
in thread Position Weight Matrix of Set of Strings

The problem arises in this line  my $n = $.;. $. (also known as $INPUT_LINE_NUMBER. See perlvar), is the record or line number of the last line read from a file.

In the original code that line came immediately after the while( my $line = <DATA> ) { loop, so $n was set to the number of lines in the file.

In your code/function, you are not reading from a file, therefore $. is 0, so $n gets set to 0, hence the error in this line:

$_ = map { $_ ? $_ / $n : 0 } @$_ for values %pwm; # Line 26

As you are passing the data into the function as a list and assigning it to an array, the equivalent value is the number of elements in the array. Eg. scalar @data;. So, modifying your code as

use Data::Dumper; my @inst = ( 'AAA', 'ATG', 'TTT', 'GTC' ); get_pwm(@inst); sub get_pwm { my @data = @_; my %pwm; foreach my $line (@data) { ++$pwm{ substr $line, $_, 1 }[$_] for 0 .. length($line) - 1; } my $n = @data; ### Modified @$_ = map { $_ ? $_ / $n : 0 } @$_ for values %pwm; # Line 26 print Dumper \%pwm; print "$_ => @{ $pwm{ $_ } }" for keys %pwm; return; }

Produces this output (assuming -l):

$VAR1 = { 'A' => [ '0.5', '0.25', '0.25' ], 'T' => [ '0.25', '0.75', '0.25' ], 'C' => [ 0, 0, '0.25' ], 'G' => [ '0.25', 0, '0.25' ] }; A => 0.5 0.25 0.25 T => 0.25 0.75 0.25 C => 0 0 0.25 G => 0.25 0 0.25

Not sure why you are outputting the results two different ways. It's also not typical to output from within a subroutine.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^4: Position Weight Matrix of Set of Strings
by neversaint (Deacon) on Sep 07, 2006 at 02:31 UTC
    Dear BrowserUK,
    I was trying your code above with this set of strings:
    my @inst = ( 'CAGGTG', 'CAGGTG' ); my $res1 = get_pwm(@inst);
    But why it returns this answer:
    $VAR1 = { 'A' => [ 0, '1' ], 'T' => [ 0, 0, 0, 0, '1' ], 'C' => [ '1' ], 'G' => [ 0, 0, '1', '1', 0, '1' ] };
    Instead of the correct
    $VAR1 = { 'A' => [ '0', '1', '0', '0', '0', '0' ], 'T' => [ '0', '0', '0', '0', '1', '0' ], 'C' => [ '1', '0', '0', '0', '0', '0' ], 'G' => [ '0', '0', '1', '1', '0', '1' ] };
    It seems that the code didn't supply 0 when the bases is not present in a particular column as it should after it found 1.
    I thought from this line of the code should do that job, but seems not.
    @$_ = map { $_ ? $_ / $n : 0 } @$_ for values %pwm;
    Please advice.

    ---
    neversaint and everlastingly indebted.......

      Try this

      #! perl -slw use strict; use Data::Dumper; sub get_pwm { my @data = @_; my $l = length( $data[0] ); my %pwm; foreach my $line (@data) { ++$pwm{ substr $line, $_, 1 }[$_] for 0 .. $l - 1; } my $n = @data; @$_ = map { $_ ? $_ / $n : 0 } @{ $_ }[ 0 .. $l - 1 ] for values +%pwm; return \%pwm; } my @inst = ( 'CAGGTG', 'CAGGTG' ); my $res = get_pwm(@inst); print "$_ => @{ $res->{ $_ } }" for keys %{ $res }; __END__ c:\test>530623 A => 0 1 0 0 0 0 T => 0 0 0 0 1 0 C => 1 0 0 0 0 0 G => 0 0 1 1 0 1

      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        BrowserUk,
        Given this set of sequences where base A doesn't exists (i.e, only CGT).
        my @inst = ( 'GGTGTGCC', 'GCTGTGGG', 'GGTGGGCC' );
        The code gives this matrix only:
        T => 0 0 1 0 0.666666666666667 0 0 0 C => 0 0.333333333333333 0 0 0 0 0.666666666666667 0.666666666666667 G => 1 0.666666666666667 0 1 0.333333333333333 1 0.333333333333333 0.3 +33333333333333
        Note that it doesn't show matrix element of A:
        A => 0 0 0 0 0 0 0 0
        My question is how can I initialize your code above, so it can also includes matrix where the bases doesn't exist?
        Given that there are only 4 alphabets [ATCG] required for the construction of matrix.

        Regards,
        Edward