control DNA file with perl

shakehands has asked for the wisdom of the Perl Monks concerning the following question:

hello,everyone.Please allow me to ask a question here. Now i have a file which contains some DNA sequences with the same length.(For example,-> ATTCATCTCTCGG,ATTGTGAGATAGA,AAGATGATCGCTC,AGATAGATCGCTG). I wanna construct PFM(position frequence matrix)with different three sequence in the example data.How can I get the result with perl. Thank u for ur help.

Comment on control DNA file with perl

Replies are listed 'Best First'.
Re: control DNA file with perl by polypompholyx (Chaplain) on Sep 06, 2013 at 08:17 UTC
Something like the code below works, for some value of 'works'. If there are Us, Ns, spaces, blank lines, or lowercase bases, then it won't, and I would suggest you add some sanity checking. However, you've not really explained what a position frequency matrix is (I'm taking an informed guess). It would also have been helpful to have seen what code you had already attempted so we could comment on that, rather than having the sneaking suspicion we're doing your homework for you. See How (Not) To Ask A Question `use strict; my @pmf; my $count = 0; while( my $seq = <DATA> ) { $count++; chomp $seq; my @bases = split //, $seq; for my $i ( 0 .. $#bases ) { $pmf[ $i ]{ $bases[$i] }++; } } for my $i ( 0 .. $#pmf ) { printf "%3u: ", $i; for my $base ( qw{ A C G T } ) { printf "$base %3.0f, ", 100 * $pmf[ $i ]{ $base } / $count; } print "\n"; } __DATA__ ATTCATCTCTCGG ATTGTGAGATAGA AAGATGATCGCTC AGATAGATCGCTG` [download]	[reply] [d/l]
Re: control DNA file with perl by Anonymous Monk on Sep 06, 2013 at 07:49 UTC
Please allow me to make smop joke: see genetic algorithm for motif finding, get the peak values with perl :)	[reply]