in reply to how can I speed up this perl??
With a tiny bit of pre-processing, you can automatically process characters in pairs:
my ( $prev, %pairs ); map { $pairs{ $prev . $_}++; $prev = $_; } @genome; delete $pairs{a}, $pairs{c}, $pairs{t}, $pairs{g};
Each character of the genome is combined in turn with the previous character to form a pair, and the corresponding entry in %pairs is incremented. Then the current character is saved in $prev to be the previous character for the next time around. Of course, for the first character there is no previous character, so there will be a dummy entry in %pairs with one of the keys 'a', 'c', 't', or 'g'. So, oncec we're all done, delete those four entries ..... so what if we deelete three entrties that don't exist.
Yes, map may be a little complicated for beginners to understand, so it may deserve a brief comment---say the previous paragraph---but it's clean and simple, and short code introduces fewer opportunities for mistakes.
Update: It's poor style to use map just for the side effects, tossing aside the return values. So it might be better to expand that line into an actual loop:
my ( $prev, %pairs ); for ( @genome ) { $pairs{ $prev . $_}++; $prev = $_; }; delete $pairs{a}, $pairs{c}, $pairs{t}, $pairs{g};
--
TTTATCGGTCGTTATATAGATGTTTGCA
|
---|