I am working with a string consisting of characters A-Z and four characters A,C,G,and T are of particular interest. I am trying to extract substrings from the string that qualifies the following conditions:
1] Distance between A,C,G, or T to any other A,C.G, or T should be 0 to 8 characters.
for example: string: AGRTGAXWXX
substrings: AG, AGRT, AGRTGA, GRT, GRTG, GRTGA, TG, TGA, GA2] I want the maximum length substring possible, in above example,
string: AGRTGAXWXX
I would just want the substring: AGRTGA
as all the other substrings are part of this longest substring and this has the maximum distance between A and A within the distance allowed.
I have this so far: can anyone help please?
#!/usr/bin/perl use strict; use warnings; my %uniq=(); my $string = 'ACRMGAHKMAHGTXX'; substr($string, $_, 10 ) =~ m[([AGTC].{0,8}[AGTC])] and ++$uniq{ $1 } for 0 .. length( $string )-1; for my $key (keys %uniq){ print $key, "\n"; } #above code outputs the following: GAHKMAHG CRMGAHKMA AHGT AHKMAHGT GT GAHKMAHGT ACRMGAHKMA #and I only want the following: ACRMGAHKMA GAHKMAHGT
Anyone has any suggestions? Thanks!
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |