Hi. I am studying regular expressions and wanted to write a script that searches a DNA string for the longest substrings that consist of repeating letters. For example: CCCCC or GGG or AAAA etc. I managed to do that, but i am not very happy with the end resuslt. I was hoping to get most of the work done with a regex, in that regard i have failed. Furthermore there are statements in the while loop that look doubtful, and the idea of using an array to store the substring along with its length might not be good. Any advice is welcome. Thank you.
use strict; use warnings; my $string = "AAATTTAGTTCTTAAGGCTGACATCGGTTTACGTCAGCGTTACCCCCCAAGTTATT +GGGGACTTT"; my @substrings; while($string =~ /([ACTG])(\1+)/g){ my $comb = $1.$2; my $len = length($1) + length($2); push @substrings, [$comb,$len]; } my @sorted = sort {$b->[1] <=> $a->[1]} @substrings; foreach my $substring (@sorted){ foreach my $element (@$substring){ print "$element "; } print "\n"; }
In reply to substrings that consist of repeating characters by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |