shakehands has asked for the wisdom of the Perl Monks concerning the following question:

hello,everyone.Please allow me to ask a question here. Now i have a file which contains some DNA sequences with the same length.(For example,-> ATTCATCTCTCGG,ATTGTGAGATAGA,AAGATGATCGCTC,AGATAGATCGCTG). I wanna construct PFM(position frequence matrix)with different three sequence in the example data.How can I get the result with perl. Thank u for ur help.

Replies are listed 'Best First'.
Re: control DNA file with perl
by polypompholyx (Chaplain) on Sep 06, 2013 at 08:17 UTC

    Something like the code below works, for some value of 'works'. If there are Us, Ns, spaces, blank lines, or lowercase bases, then it won't, and I would suggest you add some sanity checking.

    However, you've not really explained what a position frequency matrix is (I'm taking an informed guess). It would also have been helpful to have seen what code you had already attempted so we could comment on that, rather than having the sneaking suspicion we're doing your homework for you. See How (Not) To Ask A Question

    use strict; my @pmf; my $count = 0; while( my $seq = <DATA> ) { $count++; chomp $seq; my @bases = split //, $seq; for my $i ( 0 .. $#bases ) { $pmf[ $i ]{ $bases[$i] }++; } } for my $i ( 0 .. $#pmf ) { printf "%3u: ", $i; for my $base ( qw{ A C G T } ) { printf "$base %3.0f, ", 100 * $pmf[ $i ]{ $base } / $count; } print "\n"; } __DATA__ ATTCATCTCTCGG ATTGTGAGATAGA AAGATGATCGCTC AGATAGATCGCTG
Re: control DNA file with perl
by Anonymous Monk on Sep 06, 2013 at 07:49 UTC