in reply to Position Weight Matrix of Set of Strings

I think this meets the spec

#! perl -slw use strict; my %pwm; while( my $line = <DATA> ) { chomp $line; ++$pwm{ substr $line, $_, 1 }[ $_ ] for 0 .. length( $line ) -1; } my $n = $.; @$_ = map{ $_ ? $_ / $n : 0 } @$_ for values %pwm; print "$_ => @{ $pwm{ $_ } }" for keys %pwm; __DATA__ AAA ATG TTT GTC

Produces

C:\test>530623 A => 0.5 0.25 0.25 T => 0.25 0.75 0.25 C => 0 0 0.25 G => 0.25 0 0.25

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^2: Position Weight Matrix of Set of Strings
by monkfan (Curate) on Sep 06, 2006 at 08:49 UTC
    Dear BrowserUK,

    First of all I would like to apologize for coming back to you again.

    Your code worked fine as you presented.
    But when I attempted to put your code above as a function:
    use Data::Dumper; my @inst = ( 'AAA', 'ATG', 'TTT', 'GTC' ); get_pwm(@inst); sub get_pwm { my @data = @_; my %pwm; foreach my $line (@data) { ++$pwm{ substr $line, $_, 1 }[$_] for 0 .. length($line) - 1; } my $n = $.; @$_ = map { $_ ? $_ / $n : 0 } @$_ for values %pwm; # Line 26 print Dumper \%pwm; print "$_ => @{ $pwm{ $_ } }" for keys %pwm; return; }
    It gives:
    Use of uninitialized value in division (/) at pwm.pl line 26. Illegal division by zero at pwm.pl line 26. Attempt to free temp prematurely: SV 0x8f6a3f8, Perl interpreter: 0x8f +53008. Attempt to free unreferenced scalar: SV 0x8f6a3f8, Perl interpreter: 0 +x8f53008.
    How can I overcome the problem above.

    Regards,
    Edward

      The problem arises in this line  my $n = $.;. $. (also known as $INPUT_LINE_NUMBER. See perlvar), is the record or line number of the last line read from a file.

      In the original code that line came immediately after the while( my $line = <DATA> ) { loop, so $n was set to the number of lines in the file.

      In your code/function, you are not reading from a file, therefore $. is 0, so $n gets set to 0, hence the error in this line:

      $_ = map { $_ ? $_ / $n : 0 } @$_ for values %pwm; # Line 26

      As you are passing the data into the function as a list and assigning it to an array, the equivalent value is the number of elements in the array. Eg. scalar @data;. So, modifying your code as

      use Data::Dumper; my @inst = ( 'AAA', 'ATG', 'TTT', 'GTC' ); get_pwm(@inst); sub get_pwm { my @data = @_; my %pwm; foreach my $line (@data) { ++$pwm{ substr $line, $_, 1 }[$_] for 0 .. length($line) - 1; } my $n = @data; ### Modified @$_ = map { $_ ? $_ / $n : 0 } @$_ for values %pwm; # Line 26 print Dumper \%pwm; print "$_ => @{ $pwm{ $_ } }" for keys %pwm; return; }

      Produces this output (assuming -l):

      $VAR1 = { 'A' => [ '0.5', '0.25', '0.25' ], 'T' => [ '0.25', '0.75', '0.25' ], 'C' => [ 0, 0, '0.25' ], 'G' => [ '0.25', 0, '0.25' ] }; A => 0.5 0.25 0.25 T => 0.25 0.75 0.25 C => 0 0 0.25 G => 0.25 0 0.25

      Not sure why you are outputting the results two different ways. It's also not typical to output from within a subroutine.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        Dear BrowserUK,
        I was trying your code above with this set of strings:
        my @inst = ( 'CAGGTG', 'CAGGTG' ); my $res1 = get_pwm(@inst);
        But why it returns this answer:
        $VAR1 = { 'A' => [ 0, '1' ], 'T' => [ 0, 0, 0, 0, '1' ], 'C' => [ '1' ], 'G' => [ 0, 0, '1', '1', 0, '1' ] };
        Instead of the correct
        $VAR1 = { 'A' => [ '0', '1', '0', '0', '0', '0' ], 'T' => [ '0', '0', '0', '0', '1', '0' ], 'C' => [ '1', '0', '0', '0', '0', '0' ], 'G' => [ '0', '0', '1', '1', '0', '1' ] };
        It seems that the code didn't supply 0 when the bases is not present in a particular column as it should after it found 1.
        I thought from this line of the code should do that job, but seems not.
        @$_ = map { $_ ? $_ / $n : 0 } @$_ for values %pwm;
        Please advice.

        ---
        neversaint and everlastingly indebted.......